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Nucleic acid molecules encoding an enzyme 
involved in very long chain fatty acid (VLCFA) elon- 
gation in plants are disclosed. The invention includes 
a cDNA, genomic clone and encoded protein, as well 
as plants having modified VLCFA composition, such 
as modified cpicuticular waxes, and methods of mak- 
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NUCLEIC ACIDS ENCODING A PLANT ENZYME INVOLVED IN VERY 
LONG CHAIN FATTY ACID SYNTHESIS 

5 Technical Field 

This invention relates to DNA molecules cloned from plants and methods of using such DNA 
molecules to produce transgenic plants with altered fatty acid composition. 

Background 

1 0 Epicuticular waxes form the outermost layer of the aerial ponion of the plant and are thus the 

first line of interaction between the plant and its environment. The physical properties of this wax layer 
protea the plant from numerous environmental stresses. For exan^le, the hydrophobic nature of wax 
prevents dehydration (nonstomatal water loss) and aids in shedding rainwater. The reflective nature of 
wax protects the plant against UV radiation (Rcicosky and Hanover. 1978). Waxes are also known to 

1 5 protect against acid ram (Percy and Baker, 1990) and, because they are a good solvent for organic 

pollutants, they are able to impede the uptake of aqueous foliar sprays (Schreiber and Schonherr, 1992). 
Furthermore, surface waxes protect plants from bacterial and fungal (Jenks et al., 1994) pathogens ad 
play a role in plant-insect interactions (Eigenbrode and Espelie, 1995). Recently it has been shown that 
some of the compounds found in epicuticular waxes are also present m the tryphine layer of pollen grains 

20 (Preuss et al.« 1993). Without these compounds the tryphine layer erodes, resulting in pollen that is 
unable to function causing male sterility. 

Epicuticular waxes are composed of long chain, hydrophobic compounds all derived from 
saturated very long chain fatty acids (VLCFAs), that are synthesized within and then secreted from the 
epidermis. VLCFAs are defined as those fatty acids whose chain length is 20 or more carbons long. 

25 The lengths will vary from plant to plant, but typically, the wax VLCFAs are approximately 26-34 
carbon long. These VLCFAs are synthesized by a microsomal fatty acid elongation (FAE) system by 
sequential additions of C2 moieties from malonyl-cocnzymc A (CoA) to pre-existing fatty acids derived 
from*the de novo fatty acid synthesis (FAS) pathway of the plastid. Analogous to de novo FAS it is 
thought that each cycle of FAE involves four enzymatic reactions; (1) condensation of malonyl-CoA with 

30 a log chain acyl-CoA, (2) reduction to p-hydroxya^l-CoA, (3) dehydration to an cnoyl-CoA and (4) 
reduction of the enoyl-CoA, resulting m the elongated acyl-CoA (Fehling and Mukherjec. 1991). 
Together these four activities are termed the elongase (von Wettstein-Knowles, 1982). VLCFAs in the 
epidermis arc then converted to the other wax components through a number of pathways consisting of 
muJticnzymc complexes- For example VLCFAs are converted to aldehydes by fatty acyl-CoA reductase 

35 (Kolatmkudy, 1971). These aldehydes can cither be reduced by aldehyde reductase to produce primary 
alcohols (Kolattukudy, 1971), or dccarbonylated by an aldehyde dccarbonylasc to produce odd chained 
alkanes (Cheesbrough and Kolattukudy, 1984). Alkanes can then undergo oxidation to form firsdy 

I 
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secondary alcohols and then ketones (for review sec Post-Bcittcnmiller, 1996). Very little is known at 
the molecular level about the components thai are involved in the biosynthesis of wax specific 
compounds and their secretion onto the plant surface. Genetic studies have shown that there are a large 
number of genes involved in these processes (for example. 22 loci have been reported uiArobidopsis, 84 

5 in barley). However only a few of these genes have been isolated so far and the biochemical role of 
their gene products remains unknown (Lcmieux. 1996). 

In addition to being made in the epidermal cells » VLCFAs also accumulate in the seed oil of 
some plant species. To date, developing seeds have been the primary focus of research into VLCFA 
biosynthesis. In seeds VLCFAs are incorporated into triacylglycrols (TAGs), as in the Brassicaceae, or 

10 *mto wax esters, as b Jojoba. The seed VLCFAs include the agronomically important erucic acid 
(C22;l), with oils containing this fatty acid used in the manufacture of lubricants, nylon, cosmetics, 
pharmaceuticals and plasticiscrs (Battey et al., 1989); Johnston and Fritz, 1989). Conversely, VLCFAs 
have detrimental nutritional effects and are therefore undesirable in edible oils. This has led to the 
breeding of Canola rapeseed varieties that are almost devoid of VLCFAs (Stefansson et al., 1961). 

1 5 The seeds of Arabidopsis contain approximately 28% (w/wt of total fatty acids (FA)] of 

VLCFAs, cicosenoic acid (20: 1) being the predominant VLCFA (21 % of wt/wt of total FA). To 
identify the gene products that are involved in the synthesis of seed VLCFAs and establish the VLCFA 
biosynthetic pathway, several groups performed mutational analysis and screened for seed that had 
reduced VLCFA content. Each group independently identified the FATTY ACID ELONGATIONl gene 

20 (FAEl; James and Doooer, 1990; Kunst ci al., 1992; Lemieux et al.. 1990). A mutation at this locus 
resulted in reduced VLCFA levels (< 1 % wt/wt of total FA) in the seed. Several other mutations that 
were non-allelic to FAEl were also isolated. However, these mutations had a less pronounced effect in 
that VLCFAs still constimted 6.7% (wt/wt of total FA) of the seed fatty acid (Kauvic et al.. 1995; 
Kunst et al., 1992). Thus, despite the fact that four enzymatic activities are required for each elongation 

25 step, the FAEl gene was the only one foimd by mutant analysis that resulted in almost complete loss of 
VLCFA synthesis in the seed. 

The Arabidcpsis FAEl gene was subsequently cloned (James et al., 1995; WO 96/13582), and 
showed homology to three condensing enzymes: chalcone synthase, stilbene synthase and P-ketoacyi- 
[acyl carrier protem] synthase III (17 amino acids were identical to a 50 amino acid region of a 

30 consensus sequence for condensing enzymes). Based on ±h homology it was proposed that FAEl 
encodes a P-ketoacyl-coem^me A synthase (KCS), the condensing enzyme which catalyzes the first 
reaction of die microsomal fatty acid elongation system (James et al., 1995). As determined by 
Northern analysis, the FAEl gene is expressed in seeds of Arabidopsis^ but is absent from leaves (James 
ci al., 1995). This result is consistent with the fact that ibcfael mutation affects only the fatty acid 

35 composition of the developing seed, having no plcioirppic effects on fatty acid composition f the 

vegetative, or floral parts of the plant. Thus, FAEl is Regarded as a seed-specific condensing enzyme. 

Recently a cDNA from Jojoba seeds involved in the syntheses of VLCFAs has been isolated 
(Lassner et al., 1996; WO 95/15387). The protein encoded by this cDNA showed high homology to 
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FAEl (52% amino acid identity), and biochemical analysis demonstrated that it has a KCS activity- 
Using Jojoba KCS cDNA, Lassner ct al. (1996) were able to complement the mutation in a Canola 
variety of Brassica napus, restoring a low crucic acid rapeseed line to a line that contained higher levels 
of VLCFAs. This suggests that in Canola, the mutation is in the structural gene encoding KCS, or a 
5 gene affecting KCS activity. Thus, both in Arabidopsis and Brassica napus, the mutations that result in 
the abolition of VLCFA synthesis seem to affect the condensing enzyme. 

If four enzyme activities are necessary for an elongation step, and FAEl and Jojoba-^C5 only 
encode the KCS activity, one might expect to find other complementation groups that result in very low 
levels of VLCFAs synthesis. Because these complementation groups were not foimd in mutation 
10 screenings, Millar and Kunst (1997) have hypothesized that these du-ee activities are not seed specific, 
but ubiquitously present throughout the plant and shared with other FAE systems involved in VLCFA 
formation including wax biosynthesis. To test this FAEl was ccotopically expressed in yeast and in 
tissues of Arabidopsis and tobacco, where significant quantities of VLCFAs are not found. Expression 
of FAEl alone in these cells resulted in the biosynthesis and accumulation of VLCFAs, This 
15 demonstrated that the condensing enzyme is the pivotal control point of the elongase, controlling not only 
the amounts of VLCFAs produced, but also their chain lengths. In contrast, it appears that the other 
three enzyme activities of the elongase are found ubiquitously throughout the plant, are not rate limiting 
and play no role in the control of VLCFA synthesis. The ability of yeast containing FAEl to synthesize 
VLCFAs suggests that the expression, and the acyl chain lengdi specificity of the condensing enzyme, 
20 along with the apparent broad specificities of the other three FAE activities, may be universal eulcaryotic 
mechanism for regulating the amounts and acyl cham length of VLCFAs syndiesizcd in any given cell 
(Millar and Kunst, 1997). 

Thus, considering the central role of the condensing enzyme for VLCFA synthesis, the isolation 
of genes encoding condensing enzymes involved in the production of wax specific VLCFAs would 
25 facilitate the modification of wax composition through genetic engineering. Furthermore, since the 
majority of wax components are derived from VLCFAs, the availability of such genes would offer the 
potential to modify die wax load itself. This offers the potential to modify the susceptibility of plants to 
environmental stresses such as ultraviolet light, heat and drought, as well as die ability of plants to 
withstand insects and pathogens.* The present invention is directed towards nucleic acids that encode 
30 condensing enzymes for VLCFA synthesis. 

Summary of the Invention 
The present invcmion provides nucleic acids (cDN As and genomic clones) that encode a key 
enzyme in the synthesis of VLCFAs in plant epidermal cells. The activity of this enzyme is referred to 
as very long chain fatry acid elongase; the activity is required for synthesis of VLCFAs of greater than 
35 24 carbons in length. It is shown that co-suppression of the CVTl gene in plants can disrupt VLCFA 
synthesis which results in plants havmg none of the protective wax usually found on stem surfaces. In 
addition, it is shown that such plants are conditionally male sterile: when grown tmder normal humidity, 
the plants arc male sterile, but fertility can be restored by growth in an elevated himiidity environment. 



3 



wo 98/46766 



PCT/CA98/00343 



The invention thus provides the CUT! cDNA and gene nucleotide sequences CCUTJ nucleic 
acids") and the amino acid sequence of the CUTl protein. In one embodiment, the CVTI nucleic acids 
disclosed are from Arabidopsis thaliana. The open reading frame of the Arabidopsis CUTl cDNA 
molecule encodes an enzyme of 497 amino acids which catalyzes the addition of 2C imits to pre-existing 
5 C24 or longer fatty acids. 

Also encompassed within the scope of this invention are transformation vectors that include at 
least a ponion of the CJJTl nucleic acid molecules. Such vectors may be transformed into plants to 
produce transgenic plants with modified VLCFA compositions (relative to non-transgcnic plants of the 
same species). Depending on the particular sequences mcorporated into the vector, transformation with 

10 the CUTl cDNA, gene or derivatives thereof can be used to modify agronomically important traits, 
including the presence, composition and thickness of epicuticular wax layers on leaves and stems, seed 
coat fatty acids, seed oil composition and male sterility. Typically, such vectors include regulatory 
sequences, such as promoters, opcrably linked to the CUTl open reading frame or a derivative of the 
CUTl nucleic acids. For example, VLCFA synthesis may be altered by introducing into a plant a 

1 5 transformation vector that includes a sense or antisense version of the CUTl cDNA. Transgenic plants 
having modified VLCFA compositions and which are transformed with such recombinant transformation 
vectors are also provided by this invention. 

In one aspect of the invention, transformation with sense or antisense versions of the CUTl 
nucleic acids nuy be used to produce plants having modified epicuticular wax layers on the aerial parts 

20 of the plants, such as the leaves and stems. A modified epicuticular wax layer may be modified in 
physical respects, such as thickness of the wax layer, or in composition. Because these layers play a 
role in the ability of plants to resist environmental stresses, such as drought and ultraviolet light, as well 
as insects and pathogens, transformation with vector3 including forms of the CUTl nucleic acids may be 
used to produce plants with particular agronomic advantages. Producing plants with modified 

25 epicuticular wax composition may be achieved by introducing into the plants a vector in which the CUTl 
nucleic acid (or a derivative thereoO is operably linked to a promoter tha directs expression of the open 
reading frame in the epidermal cells. The CaMV 35S promoter and the endogenous CUTl gene 
promoter are examples of regulatory sequences that may be suitable for this purpose. 

Agronomically important traits in addition to wax composition may also be nKxiified using the 

30 CUTl nucleic acids of the present invention. For example, the fatty acid composiuon of the seed coat 
and the fatty acid composition of seed oil may be modified by transforming plants with the CUTl cDNA 
or derivatives thereof. Preferably, where it is desired to modify aspects of seed VLCFA composition, 
the mtroduced CUTl nucleic acid sequence will be operably linked to a promoter known to direct 
expression in seed tissues. Seed-specific promoters include the napin promoter of Brassica napus (Lee 

3S et al., 1991). In addiuon, transformation with the CUTl nucleic acids or derivatives thereof may be 

used to disrupt VLCFA synthesis in pollen, resulting in condiuonally male sterile plants. Such plants are 
useful in plant breeding programs. 

While the invention provides Cr/r7-eacoding nucleic acids bom Arabidopsis, it additionally 
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encompasses homologs, orthologs and variants and derivatives of these sequences, as well as homologs, 
orthoiogs and variants of the CUTl polypeptide sequence. Thus, in one aspect of the invention, nucleic 
acid molecules that comprise specified regions of these sequences are provided. Exemplary of such nucleic 
acid molecules are oligonucleotides that arc useful as probes or primers to detect and amplify CUTl- 
5 encoding nucleic acids from other plant species. Such oligonucleotides arc useful as hybridization probes or 
PGR primers, and typically comprise at least 15 consecutive bases of the disclosed CUTl nucleic acid 
sequences. In other cmbodinacnts, such oligonucleotides comprise longer regions of the disclosed CUTl 
sequences, such as at least 20, 25 or 30 consecutive nucleotides. 

In another a^)cct, the invention provides compositions and methods for isolating nucleic acid 

1 0 sequences that encode enzymes having CUTl activity from other plant species. Typically, such methods 
involve hybridizing probes or primers derived from the disclosed Arabidopsis sequences to nucleic acids 
obtained or derived from such other i^ant species. 

Homologous and orthologous sequences to Arabidopsis CUTl nucleic acid and CUTl amino acid 
sequences share key functional and strucmral characteristics with the disclosed Arabidopsis sequences. 

1 5 Functionally, such sequences encode (or comprise) a polypeptide that catalyzes the very long chain fatty 
acid elongation as described above. Structurally, such sequences share a specified structural relationship 
with the disclosed sequences. By way of example, in certain embodiments, homologous atnino acid 
sequences have at least 70% sequence identity with the Arabidopsis CUTl amino acid sequence. In other 
embodiments, homologous nucleic acid sequences hybridize under stringent conditions to the disclosed 

20 Arabidopsis CUTl nucleic acid sequences. 

Another dsgcd of the invention relates to the purified CUTl enzyme itself. Having provided 
nucleic acid molecules that encode this enzyme, the invention also facilitates the expression of CUTl 
enzyme in heterologous systems, including E, coU, yeast and baculovirus expression systems. Thus, the 
invention permits the large scale production of die enzyme for agricultural and other applications. 

25 In anotiier aspect of the invention the promoter sequence of the CUTl gene is disclosed. This 

promoter sequence confers epidermis-specific expression, and may be used to express a variety of nucleic 
acids in an epidermis-specific manner. 

Detailed Description of the Invention 

30 I. Definitions 

Unless otherwise noted, technical terms arc used according to conventional usage. Defmitions 
of conmion terms in molecular biology may be found in Benjamin Lewin, Ctnes V published by Oxford 
University Press, 1994 (ISBN 0-19-854287-9); Keodrew ct al (cds.). The Encyclopedia of Molecular 
Biology, published by Blaclcwell Science Ltd., 1994 (ISBN 0-632-O2182-9); and Robert A. Meyers (cd.), 
35 Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, 
Inc., 1995 (ISBN 1-56081-569-8). The nomenclature for DNA bases as set forth at 37 CFR § 1.822 and 
the standard three letter codes for amino acid residues are used herein. 

In order to facilitate review of the various embodiments of the invention, the following 
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definitions of terms are provided: 

CUTl protein: The defining functional characteristic of a CUTl protein is its enzymatic 
activity, specifically its very long chain fatty acid elongasc activity. This activity is manifested as the 
catalysis of one or more steps in the addition of 2 carbon moieties (such as malonyl-coenzyme A) to pre- 
5 existing very long chain fatty acids (VLCFAs). In a preferred embodiment, a CUTl protein catalyzes 
one or more steps in the addition of 2 carbon moieties to preexisting long chain fatty acids of at least 24 
carbon units in length. This activity can be measured by the assay described below. 

This invention provides a cDNA and a gene encoding a CUTl enzyme from Arabidopsis 
thaliana. However the invention is not limited to this particular CUTl protein: other nucleotide sequences 
1 0 which encode CUTl proteins are also part of the invention, including variants on the disclosed 

Arabidopsis cDNA and gene sequences and orthologous sequences from other plant species, including 
naturally occurring variants, such as sequences from other ecotypes, species and natural polymorphisms, 
the cloning of which is now enabled. Such sequences share the essential functional characteristic of 
encoding an enzyme having very long chain fatty acid elongase activity. Nucleic acid sequences that 
1 5 encode CUTl proteins and the proteins encoded by such nucleic acids share not only this functional 

characteristic, but also a specified level of sequence similarity (or sequence identity), as addressed below. 
The concept of sequence identity can also be expressed in the ability of two sequences to hybridize to each 
other under stringent conditions. 

Sequence identity: the similarity between two nucleic acid sequences, or two amino acid 
20 sequences is expressed in terms of the similarity between the sequences, otherwise referred to as sequence 
identity. Sequence identity is frequently measured m terms of percentage identity (or similarity or 
homology); the higher the percentage, the more similar the two sequences are. 

Methods of alignment of sequences for comparison are well-known in the art Various programs 
and alignment algorithms are described in: Smith and Waterman (1981); Needleman and Wunsch (1970); 
25 Pearson and Lipman (1988); Higgins and Sharp (1988); Higgins and Sharp (1989); Corpct ct al. (1988); 
and Peanon et al. (1994). Altschul et al. (1994) presents a detailed consideration of sequence aligmnent 
methods and homology calculations. 

The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul ct al.. 1990) is available from 
several sources, including the National Center for Biological Inforaiation (NCBI, Bethesda, IAD) and on 
30 the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and 
tblastx. It can be accessed at htp://vmw.ncbi.nlm.nih.gov/BLAST/ . A description of how to determine 
sequence identity using this program is available at http*7Avww.ncbiJilm.nih.gov/BLAST/blast help.html 

Homologs of the Arabidopsis CUTl protein are characterized by possession of at least 70% 
sequence identity counted over the full length alignment with the disclosed Arabidopsis CUTl amino acid 
3 5 sequence using the NCBI Blast 2.0, gapped blastp set to default parameters. Such homol gous peptides 
will more preferably possess at least 75%, more preferably at least 80% and still more preferably at least 
90% or 95% sequence identity with the Arabidopsis CUTl amin acid sequence determined by this 
mediod. When less than the entire sequence is being compared for sequence identity, homologs will 
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possess at least 75% and more preferably at least 85% and more preferably still at least 90% or 95% 
sequence identity over short windows of 10-20 amino acids. Methods for determining sequence identity 
over such short windows are described at http://www.ncbi.nlm.nih.gov/BLAST/blast FAQs.html . 
Homologs having the sequence identities described above will, in some embodiments, also possess 
5 VLCFA elongase activity. One of skill in the art will appreciate that these sequence identity ranges arc 
provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that 
fall outside of the ranges provided. The present invention provides not only the peptide homologs are 
described above, but also nucleic acid molecules that encode such homologs, 

Homologs of the Arabidopsis CUT! cDNA and gene are similarly characterized by possession of 
10 at least 60% sequence identity counted over the full length alignment with the disclosed Arabidopsis 
cDNA or gene sequence using the NCBI Blast 2.0, gapped blastn set to default parameters. Such 
homologous nucleic acids will more preferably possess at least 70%, more preferably at least 80% and still 
more preferably at least 90% or 95% sequence identity determined by this method. When less than the 
entire sequence is being compared for sequence identity, homologs will possess at least 85% and more 
1 5 preferably at least 90% and more preferably still at least 95% sequence identity over 30 nucleotide 
windows. Homologs having the sequence identities described above will, in some embodiments, also 
encode a polypeptide having VLCFA elongase activity. However, homologs as defmcd above are useful 
for modifying VLCFA elongase activity in transgenic plants (for example, as used in aniisense constructs) 
even when they do not encode a functional peptide. Again, one of skill in the art will appreciate that these 
20 sequence identity ranges arc provided for guidance only; it is entirely possible that strongly significant 
nucleic acid homologs could be obtained that fall outside of the ranges provided. 

Another indication that two nucleic acid molecules are substantially homologous is that the two 
molecules hybridize to each other under stringent conditions when one molecule is used as a hybridization 
probe, and the other is present in a biological sample, e.g., genomic material from a cell. Specific 
25 hybridization means that the molecules hybridize substantially only to each other and not to other 

molecules that may be present in the genomic material. Stringent conditions arc sequence dependent and 
arc different under different environmental parameters. Generally, stringent conditions are selected to be 
about 5*C to 20*'C lower than the thermal melting point fTm) for the specific sequence at a defmcd ionic 
strength and pH. The T. is the temperature (under dcfmed ionic strength and pH) at which 50% of the 
30 target sequence hybridizes to a perfectly matched probe. Conditions for nucleic acid hybridization and 
calculation of stringencies can be found in Sambrook et al, (1989) and Tijssen (1993). Hybridization 
conditions and stringencies are further discussed below. 

Nucleic acid sequences that do not show a high degree of identity may nevertheless encode 
similar amino acid sequences, due t the degeneracy of the genetic code. It is understood that changes in 
35 nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequence that 
all encode substantially the same protein. 

Probes and primers: Nucleic acid probes and primers may readily be prepared based on the 
nucleic acids provided by this invcnti n. A probe comprises an isolated nucleic acid attached to a 



7 



wo 98/46766 



PCT/CA98/00343 



detectable label or reponer molecule. Typical labels include radioactive isotopes, ligands, 
chcmilumincscent agents, and enzymes. Methods for labeling and guidance in the choice of labels 
appropriate for various purposes are discussed, e.g., in Sambrook ct al. (1989) and Ausubcl ct al. (1987). 
Primers are short nucleic acids, preferably DNA oligonucleotides 15 nucleotides or more in 
5 length. Primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to 
form a hybrid between the primer and the target DNA strand, and then extended along the target DNA 
strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid 
sequence, e.g., by the polymerase chain reaction (PGR) or other nucleic-acid amplification methods 
known in the art 

10 Methods for preparing and using probes and primers are described, for example, in Sambrook 

et al. (1989). Ausubcl ct al. (1987), and Innis et al., (1990). PGR primer pairs can be derived from a 
known sequence, for example, by using computer programs intended for that purpose such as Primer 
(Version 0.5, O 1991, Whitehead Institute for Biomedical Research, Cambridge. MA). One of skill in the 
art will appreciate that the specificity of a particular probe or primer increases with its length. Thus, for 

1 5 example, a primer comprising 20 consecutive nucleotides of the Arabidopsis CUTI cDNA or gene will 
anneal to a target sequence (e.g., a corresponding CUTI gene from Zea mays) with a higher specificity 
than a corresponding primer of only 15 nucleotides. Thus, in order to obtain greater specificity, probes 
and primers may be selected that comprise 20, 25, 30, 35, 40, 50 or more consecutive nucleotides of the 
Arabidopsis CUTI cDNA or gene sequences. Such probes and primers are useful for obtaining CUTI 

20 nucleic acid molecules (cDNA, genomic sequences, and portions of tfiesc molecules) both from Arabidopsis 
and other plant species. 

Vector: A nucleic acid molecule as introduced into a host cell, thereby producing a transformed 
host cell. A vector may include nucleic acid sequences that permit it to replicate in the host cell, such as 
an origin of replication. A vector may also include one or more selectable marker genes and other genetic 
25 elements known in the art. 

Transformed: A transformed cell is a cell into which has been introduced a nucleic acid 
molecule by molecular biology techniques. As used herein, the term transformation encompasses all 
techniques by which a nucleic acid molecule might be introduced into such a cell, including 
transformation with Agrobacterium vectors, transfection with viral vectors, transformation with plasmid 
30 vectors, and introduction of naked DNA by electroporation, lipofection, and particle gun acceleration. 

Isolated: An "isolated*' biological component (such as a nucleic acid or protein) has been 
substantially separated or purified away from other biological components in the cell of the organism in 
which the component naturally occurs, i.e., other chromosomal and extrachromosomai DNA and RNA, 
and proteins. Nucleic acids and proteins which hav been ""isolated" thus include nucleic acids and 
3S proteins purified by standard purification methods. The term also embraces nucleic acids and proteins 
prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids. 

Purified: The term purified does not require absolute purity; rather, it is intended as a relative 
term. Thus, for example, a purified CUTI protein preparation is on in which the CUTI protein is more 
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enriched than the protein is in its natural environment within a ceil. Preferably, a preparation of CUTl 
protein is purified such that CUTl protein represents at least 50% of the total protein content of the 
preparation. 

Operably linked: A first nucleic acid sequence is operably linked with a second nucleic acid 
5 sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic 
acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter effects the 
transcription or expression of the coding sequence. Generally, operably linked DNA sequences are 
contiguous and, where necessary to join two protein coding regions, in the same reading fi^e. 

Recombinant: A recombinant nucleic acid is one that has a sequence that is not naturally 
1 0 occurring or has a sequence that is made by an artificial combination of two otherwise separated segments 
of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, 
by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering 
techniques. 

Ortholog: two nucleotide or amino acid sequences are orthologs of each other if they share a 
1 5 common ancestral sequence and diverged when a species carrying that ancestral sequence split into two 
species. Orthologous sequences are also homologous sequences. 

Transgenic plant: as used herein, this terni refers to a plant that contains recombinant genetic 
material not normally fotmd in plants of this type and which has been introduced into the plant in question 
(or into progenitors of the plant) by human manipulation. Thus, a plant that is grown from a plant cell into 
20 which recombinant DNA is introduced by transformation is a transgenic plant, as are all offspring of that 
plant which contain the introduced DNA (whether produced sexually or asexually). 



n. Seqacnce Listing and Figures 

The nucleic and amino acid sequences listed in the accompanying sequence listing are showed 
25 using standard letter abbreviations for nucleotide bases, and three letter code for amino acids. Only one 
strand of each nucleic acid sequence is shown, but the complementary strand is understood to be included 
by any reference to the displayed strand. 

Scq. I.D. No. I shows the nucleotide scq[ucnce of the CUTl gene and the encoded amino acid 
sequence. 

30 Seq. I.D. No. 2 shows the nucleotide sequence of the CUTl cDNA. 

Seq. I.D. No. 3 shows the nucleotide sequence of the CUTl open reading frame. 

Seq. LD. No. 4 shows the amino acid sequence of the CUTl protein. 

Seq. I.D. Nos, 5-11 show primcn useful in PGR amplification of various regions of the 
CUTl gene, cDNA or ORF. 
35 Seq. I.D. No. 12 shows the promoter region of the CUTl genomic clone. 

Fig. 1 shows the pathways of wax biosynthesis niArc^idopsis, 
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m. Isolation and Characterization f 
the CUTl cDNA 

5 The CUTl cDNA was initially ideniificd using a TBLASTN homology search (Allschul ct al., 

1990) of the database of expressed sequenced tags (ESTs) of anonymous Arobidopsis cDNA clones 
(Newman et al., 1994) using the deduced amino acid sequence of the FAEl gene. The search found 14 
ESTs in the database which had open reading frames with significant homology to FAEL These ESTs 
did not correspond to known condensing enzymes such as chalcone synthase or 3-keioacyUacyl carrier 

1 0 protein synthase III. 

One of these ESTs was selected for further investigation, and the corresponding full length 
cDNA was isolated. This cDNA is herein refcncd to as the CUTl cDNA. Sequencing demonstrated 
that the CUTl cDNA was 1829 nucleotides long, approximately the size of the FAEl transcript (James et 
al., 1995). The CUTl cDNA contains one open reading frame of 497 amino acids, which is shorter than 

1 5 both the FAEl sequence (506 amino acids) and the jojoba KCS (521 amino acids). The CUTl cDNA 
and the protein it encodes are shown in Seq. LD. Nos. 2 and 4, respecuvely. 

There is an in frame stop codon, TAA, 15 nucleotides iq)stream of die most 5* ATG, suggesting 
that diis sequence indeed represents the fiill length amiiu) acid sequence of die protein. Thus, the C^/ 
cDNA as depicted in Seq. I.D. No. 2 has a 5* untranslated region of 58 nucleotides, an q)en reading 

20 frame of 1491 nucleotides and a 3' untranslated region of 258 nucleotides, excluding the poly(A) tail (22 
As). Comparison of the deduced amino acid sequence of the CUTl protein to FAEl revealed that they 
are 50.0% identical and 74.7% similar. 

rv. Isolation and Characterization of the CUTl Gene 

25 An Arabidopsis CUTl genomic clone was isolated from a genomic library in XGEMl 1 by probing 

nitrocellulose plaque lifts with a full-length CUTl cDNA clone. A 2.5 kb long SaH fragment containing 580 
bp of the coding sequence and 195 1 bp of the 5' upstream region was subcloned into the Scdi site of pT7T3 
18U plasmid (Pharmacia), followed by complete sequencing on both strands. The sequence of this genomic 
clone is shown in Seq. LD. No. 1 . 

30 In situ hybridization ^studies in developing shoots, leaves and siliques of Arabidopsis indicated 

epidermis-specific expression of the CUTl gene, as expected of a gene encoding an enzyme involved in wax 
biosynthesis. 

V, Analysis of the CUTl Promoter 
35 In order to confirm die tissue and cell specificity of the CUTl promoter, 5' flanking sequences 

from the CUTl genomic clone were operably linked to the uidA reporter gene encoding p-glucuronidase 
(GUS). Two constructs were made, one having a 1.9 kb promoter fragment and the second containing a 
truncated 1 2 kb promoter. These promoter-GUS fusions were introduced into Arabidopsis and tobacco by 
v4gro6acrer;i/m-mediated transformation and the promoter function characterized in transgenic plants. 
40 To obtain die 1 .9 and 1 2 kb regions of the CUTl prom ter sequence, synthetic oligonucleotides 
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homologous to portions of the 5* untranslated region of the genomic clone were used as primers to amplify 
either a 1949 bp or a 1209 bp promoter fragment by PGR. As shown in Figure I, the upstream primer was 
5'-GTGCTTTATATATGTTTG-3' (cutpro3) (Seq. LD, No. 5) in combination with the downstream 
primer 5*-CGTCGGAGAGTTTTAATG-3' (cutpro I ) (Seq. I.D, No. 6) for the PCR-synthesis of the 1949 
5 bp fragment, and 5'-CTTCGATATCGGTTGTTG-3' (cutpro2) (Seq. I.D. No. 7) and cutpro I for the 

amplification of the 1209 bp fragment. In both cases, the amplified products were subcloned in the Hincll 
site of the plasmid pT7T3 18U (Pharmacia). The inserts were then cleaved out with Hindlll andXbal and 
directionally subcloned into the conresponding sites of the binary Ti plasmid pBIlOl (ClontechX which 
contains a promotcrless GUS gene (Jefferson ct al. 1987). The pCUTl -GUS fusion constructs in pBIIOl 

1 0 were introduced into Agrobacterium iume/aciens strain GV3 101 (Koncz and Schell, 1986) by 
electroporation and selected for resistance to kanamycm (50 ^g/ml). 

For transformation of tobacco, Agrobacterium harbouring the pCUTI -GUS constTMCt was co- 
cultivated with leaf pieces of Nicotiam tabacum SRI and transformants were selected with kanamycin 
(lOOmg/mL) on solid medium (Lee and Douglas, 1996). Arabidopsis thaliana (L.) Heynh. ecotypc 

1 5 Columbia was transformed with pCUTI-GUS binary vector using a combination of in planta (Chang ct al,, 
1994, Kauvic et al., 1994) and vacuum infiltration methods (Bcchtold et aL, 1993). Plants were grown 
until the primary inflorescence shoots reached 1-2 cm in height, when this bolts were cut off. The wound 
site was inoculated with 50 mL of an overnight Agrobacterium culttirc. After 4-6 days a number of 
secondary inflorescences that appeared were cut off^ and vacuum infiltration was performed on these 

20 plants using the conditions described by Bechtoid et al. (1993). Screening for transfonned seed was done 
on SO^g/mL kanamycin as described previously (iCatavic et al., 1994). 

Tissue sections of transgenic plants containing the pCUTl -GUS constructs were placed in 100 
mM NaP04 (pH7) and 1 mM spcmiidinc for 15 rain, then incubated at 3r C in 0.5 K3{Fe(CN)J, 0.01 % 
Triton X-lOO, ImM EDTA, 10 mM ^mercaptocthanol, 5-bromo-4-chloro-3-indolyl->S-D-glucuronide in 

25 100 mM NaPO^ (pH7), until a blue color appeared (after approximately 1 hr). Following incubation with 
the substrate, chlorophyll was removed from the sections using a graded ethanol series. 

In both recipient plant species, Arabidopsis and tobacco, CUTI expression pattern mirrored that 
observed m the m situ experiments. Furthermore, both long and short CUT! promoter fragments targeted 
expression of ttieuidA gene exclusively to the epidermis. No GUS expression was detected in any of the 

30 other cell types in the stems or leaves of transgenic plants. Thus, the Arabidopsis CUTI promoter is 
regulated in a tissue specific, and cell specific manner, and epidermis specificity appean to be retained 
even in unrelated plant species like tobacco. In addition, no differences in the strength of expression were 
detected between the 1.9 kb and 12 kb promoter. 



35 VI. Preferred Methods for Producing CUTI Noclcic Acids 

With the provision of the CUTI cDNA and gene (the -CC/Ti nucleic acids'*) the polymerase 
chain reaction (PCR) may n w be utilized in a preferred method for producing the CUTI nucleic acids. 
PGR amplification f the CVTl cDNA sequence may be accomplished either by direct PCR from a plant 
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cDNA library or by Reverse-Transcription PGR (RT-PCR) using RNA extracted from plant cells as a 
template. Methods and conditions for both direct PGR and RT-PGR are known in the art and are 
described in Innis et al. (1990). Suitable plant cDNA libraries for direct PGR include the Arobidopsis 
cDNA library described by Newman ct al. (1994). Similarly, the CVTl genomic sequence may be 

5 amplified directly from genomic DNA extracted from plants, or from plant genomic DNA libraries. 
Amplification may be used to obtain the full length cDNA or genomic sequence, or may be used to 
amplify selected portions of these molecules (for example for use in antisensc constructs) 

The selection of PGR primers will be made according to the portions of the CUTl nucleic acids 
which arc to be amplified. Variations in amplification conditions may be required to accommodate 

10 primers of differing lengths; such considerations are well known in the an and arc discussed in Innis et 
al. (1990), Sambrook et al. (1989), and Ausubcl et al (1987). By way of example only, the entire CUTl 
cDNA molecule as shown in Seq. I.D. No. 2 may be amplified using the following combination of 
primers: 

primer 1 5' AAATACGCTAATCAGATTTTGTAA 3* (Seq. I.D. No. 8) 
1 5 primer 2 5' TTTAAAGAGAGAGAAATATKriTA 3' (Seq. I.D. No. 9) 

The open reading frame portion of the cDNA may be amplified using the following primer pair: 
primer 3 5' ATGGCTCAGGGACCGATGGCAGAG 3' (Seq. LD. No. 10) 
primer 4 5' GAGGACGAGAAACTAAAAAATACC 3' (Seq. I.D. No. 11) 
20 These primers are illustrative only; it will be appreciated by one skilled in the art that many 

different primers may be derived from the provided sequences in order to amplify particular regions of 
the CVTl sequences. Resequencing of PGR products obtained by these amplification procedures is 
recommended; this will faciliute confirmauon of the amplified CUTl sequence and will also provide 
information on naniral variation on this sequence in different ecotypes and plaiu populations. 
25 Oligonucleotides which are derived from the CUTl nucleic acid sequences and which arc 

suitable for use as PGR primers to amplify the CUTl nucleic acid sequences are encompassed within the 
scope of the present invention. Preferably, such oligonucleotide primers will conq)rise a sequence of 15- 
20 consecutive nucleotides of the CUTl nucleic acid sequences. To enhance amplification specificity, 
primers comprising at least 20-30 consecudve nucleotides of diese sequences may also be used. 

30 

Vn. Cloning CVTl Variants 

With the provision herein of the CVTl nucleic acid sequences, the cloning by standard 
methodologies of corresponding cDNAs and genes from other ecotypes and plant species, as well as 
polymorphic forms of the disclosed sequences is now enabled. Thus, the present invention includes 
3 5 methods of isolating a nucleotide sequence encoding a plant very long chain fatty acid elongation enzyme 
from a plant Both conventional hybridization and PGR amplification procedures may be utilized to 
clone such sequences. Gommon to both of these techniques is the hybridization f probes or primers 
derived from the disclosed CVTl nucleic acid sequences to a target nucleotide preparation, which may 
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be. in the case of conventional hybridization approaches, a cDNA or genomic library or, in the in the 
case of PGR amplification, extracted genomic DNA, mRNA, a cDNA library or a genomic library. 

Direct PGR amplification may be performed on cDNA libraries prepared from the plant species 
in question, or RT-PCR may be performed using mRNA extracted from the plant cells using standard 
5 methods, PGR primers will comprise at least 15 consecutive nucleotides of the CUTl nucleic acid 
sequences. One of skill in the art will appreciate that sequence differences between the disclosed CUTl 
nucleic acid sequences and the target gene to be amplified may result in lower amplification efficiencies. 
To compensate for this, longer PGR primers or lower annealing tcmperamrcs may be used during the 
amplification cycle. Where lower annealing temperatures are used, sequential rounds of amplification 
1 0 using nested primer pairs may be necessary to enhance specificity. 

For conventional hybridization techniques, the hybridization probe is preferably labeled with a 
detectable label such as a radioactive label, and the probe is of at least 20 nucleotides in length. As is 
well taown itx the art, increasing length of hybridizatioa probes tends to give enhanced specificity. The 
labeled probe derived from, for example, the CUTl cDNA sequence may be hybridized to a plant 
1 5 cDN A or genomic library and the hybridization signal detected using means known in the art. The 
hybridizing colony or plaque (depending on the type of library used) is then purified and the cloned 
sequence contained in that colony or plaque isolated and characterized. 

Vin. Use of the CUTl Nucleic Acids to Produce 
20 Plants with Modified VLCFA Composition 

Once a gene or cDNA (**nucleic acid") encoding a protein involved in the determination of a 
particular plant characteristic has been isolated, standard techniques may be used to express the nucleic 
acid in transgenic plants in order to modify that particular plant characteristic. The basic approach is to 

25 clone the nucleic acid into a transformation vector, such that it is operably linked to control sequences 
(e.g.. a promoter) which direct expression of the open reading frame in plant ceils. The transformation 
vector is then introduced into plant cells by one of a number of techniques (e.g., electroporation) and 
progeny plants containing the introduced nucleic acid are selected. Preferably all or part of the 
transformation vector will stably integrate into the genome of the plant cell. That part of the 

30 transformation vector which integrates into the iJlant cell and which contains the introduced nucleic acid 
and associated sequences for controlling expression (the introduced "transgene") may be referred to as 
the recombinant expression cassette. 

Selection of progeny plants containing the introduced transgcnc may be made based upon the 
detection of an altered phenotypc. Such a pbenotypc may result directly from the nucleic acid cloned 

35 into the transformation vector or may be manifested as enhanced resistance to a chemical agent (such as 
an antibiotic) as a result of the inclusion of a dominant selectable marker gene incorporated into the 
transformation vect r. 

The choice of (a) c ntrol sequences and (b) how the nucleic acid (or selected portions of the 
nucleic acid) arc arranged in the transformation vector relative to the control sequences determine, in 



13 



wo 98/46766 



PCT/CA98/00343 



pan, how the plant characteristic affected by the introduced nucleic acid is modified. For example, the 
control sequences may be tissue specific, such thai the nucleic acid is only expressed in panicular tissues 
of the plant (e.g., pollen) and so the affected characteristic will be modified only in those tissues. The 
nucleic acid sequence may be arranged relative to the control sequence such that the nucleic acid 

5 transcript is expressed normally, or in an antiscnse orientation. Expression of an antisensc RNA 
corresponding to the cloned nucleic acid will result in a reduction of the targeted gene product (the 
targeted gene product being die protein encoded by the plant gene from which the introduced nucleic 
acid was derived). Over-expression of the introduced nucleic acid, resulting from a plus-sense 
orientation of the nucleic acid relative to the control sequences in the vector, may lead to an increase in 

10 the level of the gene product, or may resuh in co-suppression (also termed "sense suppression") of that 
gene product. 

Successful examples of the modification of plant characteristics by transformadon with cloned 
nucleic acid sequences are replete in the technical and scientific literature. Selected examples, which 
serve to illustrate the current loiowledge in this field of technology, and which are herein incorporated by 
IS reference, include: 

U.S. Patent No. 5,451,514 to Boudet (modification of lignin syndiesis using antisense RNA and 
co-suppression); 

U.S. Patent No. 5.443,974 to Hitz (modification of saturated and unsaturated fatty acid levels 
using antisense RNA and co-suppression); 
20 U.S. Patent No. 5,530,192 to Murasc (modification of amino acid and fatty acid composition 

using antisense RNA); 

U.S. Patent No. 5,455,167 to Voelkcr (modification of medium chain fatty acids) 

U.S. Patent No. 5,231,020 to Jorgensen (modification of fiavonoids using co-suppression); 

U.S. Patent No. 5,583,021 to Dougherty (modification of virus resistance by expression of plus- 
25 sense untranslatable RNA); 

WO 96/13582 (modification of seed VLCFA composition using over expression, co-suppression 
and antisense RNA in conjunction with the Arabidopsis FAEl gene); and 

WO 95/15387 (modification of seed VLCFA composition using over exprcssicHi of jojoba wax 
synthesis gene). 

30 These examples include descriptions of transformation vector selection, transformation 

techniques and the construction of constructs designed to over-express the introduced nucleic acid or to 
express antisense RNA corresponding to the nucleic acid. In light of the foregoing and the provision 
herein of the CUT! nucleic acids, it is thus apparent that one of skill in the art will be able to introduce 
these nucleic acids, or derivative forms of diese molecules (e.g., antisense forms), into plants in order to 

35 produce plants having modified VLCFA compositions. Examples one and two below provides 

illustrations of this in which the CUTl cDNA is operably linked to the CaMV 35S promoter sequence, 
cloned into the pBIN19 transf rmation vector and introduced into Arabidopsis using a vacuum infiltration 
method. 
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As reponed in Example one, certain of the plants transformed in this way had no detectable 
cpicuiicular wax layers, indicating that transformation with the CVTl cDNA had disrupted normal 
VLCFA synthesis in the plant epidermal cells. Such disruption is likely attributable to the phenomenon 
termed co-suppression (or sense-suppression). These plants arc thus referred to as **CLTi-suppressed". 
5 This phenomenon may be affected by factors such as positional location of the introduced sequences in 
the plant genome. 

Over-expression of CUTl protein in transgenic plants, resulting in plants enhanced epicuticular 
wax layers will be a useful agronomic trait, providing increased drought and insect resistance. For 
example, drought resistance in rice is associated with high wax lines rich in Q^, Cjj and C35 alkancs 
1 0 (O'Toolc and Cruz» 1983; Haque et al., 1992). Increased wax deposition in transgenic plants can be 

accomplished by overcxprcssion of CUTl protein, while the identification of the CUTl promoter allows 
targeting of lipid modification enzymes such as desaturases, thioesterases and other condensing enzymes 
with different specificities to the epidermal cells to modify wax composition. 

Transformation of plants with the CVTl nucleic acids or derivatives thereof may be used to 
1 5 modify other plant characteristics, such as seed coat composition and seed oil composition. Because 
condensing enzymes are pivotal enzymes in the synthesis of VLCFAs, controlling levels of accumulation 
of VLCFAs and their acyl chain length (Millar and Kunst, 1997) through the manipulation of CUTl 
expression will permit the production of plants having novel fktty acid compositions. For instance, the 
accumulation of VLCFAs in tobacco seed expressing FAEl from Arabidopsis (Millar and Kunst, 1997) 
20 raises the possibility of producing VLCFAs in plant species that currently do not synthesize VLCFAs. In 
addition, targeting of CUTl to seeds will be useful to produce crop plants capable of syntiiesising new, 
agronomically important VLCFAs in seed oil. 

Disruption of CUTl activity in transgenic plants also provides a simple means for obtaining 
conditional male sterility in plants (see Example two). One of the major factors contributing to increases 
25 in crop productivity is the development of hybrid varieties of crops. Several different breeding strategies 
have been used to produce hybrid seed, but none of these strategies can be used as a general approach in 
all crop plants (Goldberg et al.,1993). As an alternative, genetically engineered systems and strategies for 
male fertility control that are applicable to a wide range of crops have recently been developed. For 
example, ifuclear male sterility has been engineered by (1) tapctum-specific ejipression of a bacterial 
30 RNAse gene (Mariani et al., 1990, 1992), (2) overcxpression of die rolC gene from Agrobacterium 

rhizogenes (Fladung, 1990; Schmulling et al., 1988, 1992), (3) expression of glucanase that desrupts the 
callosc wall of the microsporophytc prematurely (Tsuchiya et aL, 1995; Worrall et al., 1992), (4) the 
inhibition of flavonoid biosynthetic genes like chalcone synthase and dihydroflavolon 4-reductase (van der 
Krol et al., 1988, 1990; van der Meer et al, 1992; Napoli et al. 1990; Taylor and Jorgensen, 1992), and (5) 
35 altered expression of stilbene synthase (Fischer ct aL, 1997), However, in most of these cases the 

restoration of fertility is not simple, or not easily controlled. In contrast, conditional male sterility caused 
by suppression of CUTl activity is easily reversible under high relative humidity. 

The selection of vectors and promoters appropriate for targeting particular characteristics for 
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modification (such as sced-spccific expression) arc well known; the following paragraphs set forth 
general guidance on the various options available in producing transgenic plants having modified 
VLCFA composition. 

5 a. Plant Types 

VLCFAs are found in all plant types, and thus DNA molecules according to the present 
invention (e.g., the CUTl cDNA, gene, homologs and antisense forms thereoO may be mtroduced into 
any plant type in order to modify the VLCFA composition of the plant. Thus, the sequences of the 
present invention may be used to modify VLCFA composition in any hi^cr plant, including 

1 0 monocotyledonous and dicoty ledenous plants, including, but not limited to maize, wheat, rice, barley, 
soybean, beans in general rape/canola, alfalfa, flax, sunflower. safElower, brassica, coaon, flax, peanut, 
clover, vegetables such as letmce, tomato, cucurbits, potato, carrot, radish, pea, lentils, cabbage, broccoli, 
brussel sprouts, peppers; tree fruits such as apples, pears, peaches, apricots; flowers such as carnations and 
roses. 

15 

b. Vector Construction^ Choice of Promoters 

A number of recombinam vectors suitable for suble transfection of plant cells or for the 
establishment of transgenic plants have been described including those described in Pouwels et al.. 
(1987), Weissbach and Weissbach, fl989). and Gclvin et al., (1990). Typically, plant transformation 
20 vectors incltide one or more cloned plam genes (or cDNAs) under the transcriptional control of 5' and 3' 
regulatory sequences and a dominant selectable marker. Such plant transformation vectors typically also 
contain a promoter regulatory region {e.g., a regulatory region controlling inducible or constitutive, 
environmentally-or dcvelopmentally-regulated, or cell- or tissue-specific cjq)ression), a transcription 
initiation stan site, a ribosome binding site, an RNA processing signal, a transcription termination site, 
25 and/or a polyadenylation signal. 

Examples of constitutive plant promoters which nuiy be useful for expressing CUTl nucleic 
acids include: the cauliflower mosaic virus (CaMV) 35S promoter, which confers constimtive, high-level 
expression in most plant tissues {see, e.g,, Odel et al., 1985, Dekcyscr et al., 1990. Terada and 
Shimamoto, 1990); the nopalinc synthase promoter (An et al., 1988); and the octopinc synthase promoter 
30 (Fromm et al., 1989). 

A variety of plant gene promoters diat are regulated in response to environmental, hormonal, 
chemical, and/or developmental signals, also can be used for expression of CUTl nucleic acids in plant 
cells, including promoters regulated by: (a) beat (CaUis et al., 1988); (b) light {e.g. , the pea rbcS-3A 
promoter. Kuhlcmcicr et ai., 1989, the maize rbcS promoter, Schaffncr and Sheen, 1991, and the 
35 chlorophyll a/b binding protein promoter, Simpson et al., 1985); (c) hormones, such as abscisic acid 
(Marcotte et al., 1989); (d) wounding {e.g., wuni, Sicbcrtz et al., 1989); and (e) chemicals such as 
methyl jasmonate or salicylic acid. It may also be advantageous to employ tissue-specific promotcn. 
such as those described by Roshal et al., ^987). Schemthancr et al., ^1988), and Bustos ci al., ^989). 
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Alternatively, tissue specific (root, leaf, flower, and seed for example) promoters (Carpenter et al. 
1992, Denis et al. 1993, Opperman et al. 1993, Stockhause et al 1997; Roshal ct al., 1987; Schemihaner et 
aL, 1988; and Bustos et al., 1989) can be fused to the coding sequence to obtained particular expression in 
respective organs. In addition, the timing of the expression can be controlled by using promoters such as 

5 those acting at scnescencing (Gan and Amasino 1995) or late seed development (Odell et al 1994). The 
promoter region of the CUT! genomic sequence disclosed herem confers epidermis-specific expression in 
Arabidopsis and tobacco. Accordingly, the native promoter may be used to obtain epidermis-specific 
expression of the introduced transgcnc. 

For producing conditionally male sterile plants by blocking CUTl activity in pollen, it is preferable 

10 to use a pollen-specific promoter (so as to avoid pleiotropic effects) . Thus, the CUT! coding region may be 
expressed under the control of the tapctum-specific promoters such as TA29 (Mariani et aL,1990, 1992), 
MS2 (Aarts ct al., 1997), and tapl (Nacken ct al, 1991). 

Plant transformation vectors may also include RNA processing signals, for example, introns, 
which may be positioned upstream or downstream of the CUTl nucleic acid sequence in the transgcnc. 

15 In addition, the expression vectors may also include additional regulatory sequences from the 3*- 
untranslaied region of plant genes, e,g., a 3* terminator region to increase mRNA stability of the 
mRNA, such as the PI-U terminator region of potato or the octopine or nopaiine synthase 3* terminator 
regions. 

Finally, as tioted above, plant transformation vectors may also include dominant selectable 
20 marker genes to allow for the ready selection of transformants. Such genes include those encoding 

antibiotic resistance genes (e,g., resistance to hygromycin, kanamycin, bleomycin, G418. streptomycin 
or spectinomycin) and herbicide resistance genes (e.g., phosphinothricin acetyltransferase). 

c. Arrangement of CUTl Nucleic Acids in Vector 

25 As noted above, the particular arrangement of the CUTl nucleic acid in the transformation 

vector will be seleaed according to the expression of the nucleic acid desired. 

Where enhanced VLCFA synthesis is desired, the CUTl nucleic acid may be opcrably linked to 
a constinitive high-level promoter such as the CaMV 35S promoter. Modification of VLCFA synthesis 
may also be achieved by introducing into a plant a transformation vector containing a variant form of the 

30 CUTl nucleic acid, for example a form which varies from the exact nucleotide sequence of the CUTl 
nucleic acid, but which encodes a protein that retains the functional characteristic of the CUTl protein, 
i.e., very long chain fatty acid elongation activity. 

In contrast, a reduction of VLCFA synthesis may be obtained by introducing antisense 
constructs based on the CUTl nucleic acid sequence into plants. For antisense suppression, the CUTl 

35 nucleic acid is arranged in reverse orientation relative to the promoter sequence in the transformation 
vector. The introduced sequence need not be the full length CUTl nucleic acid, and need not be exactly 
homologous to the CUTl nucleic acid. Generally, however, where the introduced sequence is of shorter 
length, a higher degree of homology to the native CUTl sequence will be needed for effective antisense 



17 



wo 98/46766 



PCT/CA98/0O343 



suppression. Preferably, the intrcKluccd antiscnse sequence in the vector will be at least 30 nucleotides in 
length, and improved antiscnse suppression will typically be observed as the length of the antisense 
sequence increases. Preferably, the length of the antisense sequence in the vector will be greater than 
100 nucleotides. Transcription of an antisense construct as described results in the production of RNA 

5 molecules that arc the reverse complement of mRNA molecules uranscribed from the endogenous CUTl 
gene in the plant cell. Although the exact mechanism by which antisense RNA molecules interfere with 
gene expression has not been elucidated, it is believed that antisense RNA molecules bind to the 
endogenous mRNA molectilcs and thereby inhibit translation of the eixiogenous mRNA. 

Suppression of endogenous CUTl geneexprcssion can also be achieved using ribozymes, 

10 Ribozymes are synthetic RNA molecules that possess highly specific endoribonuclease activity. The 
production and use of ribozymes are disclosed in U.S. Patent No. 4.987.071 to Ccch and U.S. Patent 
No. 5.543,508 to Hasclhoff, which are hereby incorporated by reference. The inclusion of ribozyme 
sequences within antiscnse RNAs may be used to confer RNA cleaving activity on the antiscnse RNA, 
such that endogenous mRNA molecules that bind to the antiscnse RNA are cleaved, which in turn leads 

15 to an enhanced antisense inhibition of endogenous gene expression. 

Constructs in which the CUTl nucleic acid (or variants thereon) are over-c:q)ressed may also be 
used to obtain co-suppression of the endogenous CUTl gene in the manner described in U.S. Patent No. 
5,231.021 to Jorgensen. Such co-suppression (also termed sense suppression) does not require that the 
entire CUTl nucleic acid be introduced into the plant cells, nor docs it require that the introduced 

20 sequence be exactly identical to the CUTl nucleic acid. However, as with antiscnse suppression, the 

suppressive efficiency will be enhanced as (1) the introduced sequence is lengthened and (2) the sequence 
similarity between the introduced sequence and the endogenous CUTl gene is increased. Example I 
below provides an illustration of co-suppression of the endogenous CUTl gene by transformation of 
plants with the CUTl cDNA. 

25 

d. Transformation and Regeneration Techniques 

Transformation and regeneration of both monocotyledonous and dicotyledonous plant cells is 
now routine, and the selection of the most appropriate transformation technique will be determined by 

30 the practitioner. The choice of method will vary with the type of plant to be transformed; those skilled 
in ±e art will recognize the suitability of particular mcdiods for given plant types. Suitable methods 
may include, but are not limited to: electroporation of plant protoplasts; liposome-mediated 
transformation; polyethylene mediated transformation; transformation using viruses; micro-injecdon of 
plant cells; micro-projectile bombardment of plant cells; vactmm infiltration; and Agrobacterium 

35 mm</Ta>w (AT) mediated transformation. Typical procedures for transforming and regenerating plants 
are described in the patent documents listed at the bcghming of this secuon. 



t. Selection of Transformed Plants 

Following transf rmation and regeneration of plants widi the transformation vector, transformed 
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plants are preferably selected using a dominant selectable marker incorporated imo the transformation 
vector. Typically, such a marker will confer antibiotic resistance on the seedlings of transformed plants, 
and selection of transformants can be accomplished by exposing the seedlings to appropriate 
concentrations of the antibiotic. Example I provides an example of such an approach in which seedlings 
5 were selected using kanamycin. 

After transformed plants are selected and grown to mamrity, they can be assayed to determine 
whether VLCFA synthesis has been altered as a result of the introduced transgcne. This can be done in 
several ways, including, as described in Example 1, microscopic examination of the epicuiicular wax 
layer and chromatographic analysis. Lipids may also be extracted from plant material and analyzed by 
10 gas chromatography as described by Dooner (1990). In addition, antisense or sense suppression of the 
endogenous ClJTl gene may be detected by analyzing mRNA expression on Northern blots, 

IX. Production of Sequence Variants 

As noted above, modification of VLCFA synthesis in plant cells can be achieved by 

1 5 transforming plants with CUTl nucleic acids, antisense constructs based on CVTl nucleic acid 

sequences or other variants on CUTl nucleic acid sequences. With the provision of the CVTl cDNA and 
geix>mic sequences herein, the creation of variants on these CUTl nucleic acid sequences by standard 
mutagenesis techniques is now enabled. 

Variant DNA molecules include those created by standard DNA mutagenesis techniques, for 

20 example, MI3 primer mutagenesis. Details of these techniques are provided in Sambrook et al. (1989), 
Ch. 15. By the use of such techniques, variants may be created which differ in minor ways from the 
disclosed CUTl nucleic acids* DNA molecules and nucleotide sequences that are derivatives of those 
specifically disclosed herein and which differ from those disclosed by the deletion, addition or 
substimtion of nucleotides while still encoding a protein which possesses the functional characteristic of 

25 the CUTl protein (i.e., very long ciiain fatty acid elongation activity) are comprehended by this 

invention. DNA molecules and nucleotide sequences which arc derived from the CUTl nucleic acids 
include DNA sequences which hybridize under moderately stringent conditions to the DNA sequences 
disclosed, or fragments thereof. 

Hybridizadon conditions resulting in particular degrees of stringency will vary depending upon 

30 the namre of the hybridization method of choice and the composition and length of the hybridizing DNA 
used. Generally, the temperature of hybridization and the ionic strength (especially the Na"** 
concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations 
regarding hybridization conditions required for attaining particular degrees of stringency are discussed by 
Sambrook et al. (1989), chapters 9 and 11. herein incorporated by reference. By way of illustration 

35 only, a hybridization experiment may be performed by hybridization of a Ct/Ti-derivcd probe (for 

example, the CVTl cDNA sequence) to a target DNA molecule (for example, die CUTl homolog from 
2ea Mays) which has been electrophoresed in an agarose gel and transferred to a nitrocellulose 
membrane by Southern bloning (Soiuhem, 1975), a technique well known in the art and described in 
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(Sambrook ct al., 1989). Hybridization with a urgci probe labeled with ["P)-dCTP is generally carried 
out in a solution of high ionic strength such as 6xSSC at a temperature that is 20-25 'C below the melting 
temperature, T^, described below. For such Southern hybridization experiments where the target DNA 
molecule on the Southern blot conuins 10 ng of DNA or more, hybridization is typically carried out for 
5 6-8 hom3 using 1-2 ng/ml radiolabeled probe (of specific activity equal to 10' CPM/^ig or greater). 
Following hybridization, the nitrocellulose filter is washed to remove background hybridization. The 
washing conditions should be as stringent as possible to remove background hybridization but to retain a 
specific hybridization signal. The term r« represents the tempcramrc above which, under the prevailing 
ionic conditions, the radiolabeled probe molecule will not hybridize to its target DNA molecule. The r„ 
1 0 of such a hybrid molecule may be estimated from the following equation {Bolton and McCarthy, 1962): 

r„ = 8l,5'C - l6.6(logiolNa*]) + 0.41(%G+C) - 0.63(% fonnamide) - (600//) 

Where / = the length of the hybrid m base pairs. 
1 5 This equation is valid for concentrations of Na* ux the range of O.Ol m to 0,4 m, and it is less 

accurate for calculations of in solutions of higher [Na*]. The equation is also primarily valid for 
DNAs whose G+C content is in the range of 30% to 75 %, and it applies to hybrids greater than 
100 nucleotides in length (the behavior of oligonucleotide probes is described in detail in Ch. 11 of 
Sambrook ctal.. 1989). 

20 Thus, by way of example, for a 150 base pair DNA probe derived from the first 150 base pairs 

of the open reading firame of the CXJTl cDNA (with a hypothetical %GC = 45%), a calculation of 
hybridization conditions required to give particular stringencies may be made as follows: 

For this ejiample, it is assumed that the filter will be washed in 0.3 ^SC solution following 
hybridization, thereby (Na*] = 0.045M, %GC = 45%, Fonnamide concentration = 0, / =^ 150 base 

25 pairs, 

r„ = 81.5 - 16(log,o[Na*)) + (0.41 x 45) - (600/150) 
and so r. - 74.4*C. 

The r„ of double-stranded DNA decreases by l-1.5'C with every 1% decrease in homology 
(Bonner ct al., 1973). Therefore, for this given example, washing the filter in 0.3 xSSC at 59.4-64 .4*C 

30 will produce a stringency of hybridization equivalent to 90%. Alternatively, washing the hybridized 
filter in 0.3 xSSC at a temperamre of 65.4-68.4*C will yield a hybridization stringency of SM%. The 
above example is given entirely by way of theoretical illustration. One skilled in the an will appreciate 
that other hybridization techniques may be utilized and that variadons in experimental conditions will 
necessitate alternative calculations for stringency. 

3 5 DNA sequences that encode a protein having VLCFA longase activity and which hybridize to the 

disclosed CUTl nucleic acid sequences under hybridization conditions of at least 75%, more preferably at 
least 80%, 85% or 90% stringency, and most preferably at least 95% stringency are encompassed within the 
present invention. 

The degeneracy of the genetic cod further widens the scope of the present invention as it enables 
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major variations in the nucleotide sequence of a DNA molecule while maintaining the amino acid 
sequence of the encoded protein. For example, the f urth amino acid residue of the CUTl protein is 
aianine. This is encoded in the CUT! ORF by the nucleotide codon tripiet GCA. Because of the 
degeneracy of the genetic code, three other nucleotide codon tripiets-GCT, GCC and GCG-also code for 
5 alanine. Thus, the nucleotide sequence of the CUTl ORF could be changed at this position to any of these 
three codons without affecting the amino acid composition of the encoded protein or the characteristics of 
the protein. Based upon the degeneracy of the genetic code, variant DNA molecules may be derived from 
the CUTl nucleic acid molecules disclosed herein using standard DNA mutagenesis techniques as 
described above, or by synthesis of DNA sequences. Thus, this invention also encompasses DNA 
1 0 sequences which encode the CUTl protein but which vary from the CUTl nucleic acid sequences by 
virtue of the degeneracy of the genetic code. 

One skilled in the art will recognize that DNA mutagenesis techniques may be used not only to 
produce variant DNA molecules, but will also facilitate the production of proteins which differ in certain 
structural aspects from the CUTl protein, yet which proteins are clearly derivative of diis protein and 
1 S which maintain the essential characteristics of the CUTl protein. Newly derived proteins may also be 
selected in order to obtain variations on the characteristic of the CUTl protein, as will be more fully 
described below. Such derivatives include those with variations in amino acid sequence including minor 
deletions, additions and substinitions. 

While the site for introducing an amino acid sequence variation is predetermined, the mutation 
20 per se need not be predeterxnined. For example, in order to optimize the performance of a mutation at a 
given site, random mutagenesis may be conducted at the target codon or region and the expressed protein 
variants screened for the optimal combination of desired activity. Techniques for making substimtion 
mutations at predetermined sites in DNA having a known sequence as described above arc well known. 
Amino acid substitutions are typically of single residues; insertions usually will be on the order 
25 of about from 1 to 10 amino acid residues; and deletions will range about from 1 to 30 residues. 

Deletions or insertions preferably are made in adjacent pairs, i.e., a deletion of 2 residues or insertion of 
2 residues. Substitutions, deletions, insertions or any combination thereof may be combined to arrive at 
a fmal construct. Obviously, the mutations that are made in the DNA encoding the protein must not 
place the sequence out of reading fr^e and preferably will not create complementary regions that could 
30 produce secondary mRNA sirucnirc. 

Substitutional variants are those in which at least one residue in the amino acid sequence has 
been removed and a different residue inserted in its place. Such substitutions generally are made in 
accordance with the following Table 1 when it is desired to finely modulate the characteristics of the 
protein. Table 1 shows amino acids which may be substituted for an original amino acid in a protein and 
35 which are regarded as conservative substitutions. 
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Table 1. 



10 



15 



20 



Orieinal Residue 


Conservative Si 


Ala 


scr 


Arg 


lys 


Asn 


gin; his 


Asp 


glu 


Cys 


scr 


Gin 


asn 


Glu 


asp 


Gly 


pro 


His 


asn; gin 


He 


leu, val 


Leu 


ile; val 


Lys 


arg; gin; glu 


Met 


leu; iie 


Phc 


met; leu; tyr 


Ser 


thr 


Thr 


scr 


Trp 


tyr 


Tyr 


tip; phe 


Val 


ile; leu 



Substantial changes in enzymatic function or other features are made by selecting substitutions 
25 that arc less conservative than those in Table 1, i.e., selecting residues that differ more significantly in 
their effect on maintaining (a) the structure of the polypeptide backbone in the area of the substitution, 
for example, as a sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the 
target site, or (c) the bulk of the side chain. The substimtions v^^ch in general arc expected to produce 
the ^greatest changes in protein properties will be those in which (a) a hydrophilic residue, e.g., scryl or 
30 threonyl, is substituted for (or by) a hydrophobic residue, e.g., leucyl, isoleucyl, phenylalanyl, valyl or 
alanyl; (b) a cysteine or proline is substituted for (or by) any other residue; (c) a residue having an 
electropositive side chain, e.g., lysyl, arginyl, or histadyl, is substinited for (or by) an electronegative 
residue, e.g., glutamyl or aspartyl; or (d) a residue having a bulky side chain, e.g., phenylalanine, is 
substituted for (or by) one not having a side chain, e.g., glycine. 
35 The effects of these amino acid substimtions or deletions or additions may be assessed for 

derivatives of the CUTl protein by analyzing the ability of the derivative proteins to catalyze the addition 
of C2 units to existing VLCFA tmits. These assays may conveniently be performed using the yeast- 
based systems for assaying fatty acid elongation described below. 

40 X. Production of recombinant CUTl protein 

using heterologous expression systems 

Many different expression systems arc available for expressing cloned nucleic acid molecules. 
Examples of prokaryotic and cukaryotic expression systems that are r utinely used in laboratories are 
45 described in Chapters 16-17 of Sambrook et al. (1989), which are herein inc rporated by reference. Such 
systems may be used to express CUTl protein and derivatives at this protein at high levels to facilitate 
purification and functional analysis of the enzyme. Apart from pcnnitting the activity of the enzyme to be 
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determined (which is particuiariy useftil to assess the activity of homologous and derivative proteins), 
heterologous expression facilitates ther uses of the purified enzyme. For example the purified enzyme 
produced by recombinant means may be used to synthesize VLCFAs and other fatty acid metabolites in 
vitro, particularly radio- or fluorescent- labeled forms of VLCFAs and metabolites. These molecules may 
5 be used as tracers to determine the location in plant tissues and cells of VLCFAs and their metabolites. 
The purified recombinant enzyme may also be used as an immunogcn to raise enzyme-specific antibodies. 
Such antibodies are useful as both research reagents (such as in the study of VLCFA regulation in plants) 
as well as diagnostically to determine expression levels of the enzyme in agricultural products, including 
pollen. 

10 By way of example only, high level expression of the CUTl protein may be achieved by cloning 

and expressing the cDNA in yeast cells using the pYES2 yeast expression vector (Invitrogen, San Diego, 
CA). Secretion of the recombinant CUTl firom the yeast cells may be achieved by placing a yeast signal 
sequence adjacent to the CUTl coding region. A number of yeast signal sequences have been 
characterized, including the signal sequence for yeast invertase. This sequence has been successfully used 

15 to direct the secretion of heterologous proteins from yeast cells, including such proteins as human 

interferon (Chang et al., 1986), human lactofcrrin (Liang and Richardson, 1993) and prochymosin (Smith 
et al., 19S5). Alternatively, the enzyme may be expressed at high level in standard prokaryotic expressi n 
systettis, such as £ coli . 

20 XI. Assays forVLCFA elongase acUvity 

To aid the biochemical characterization of the CUTl protein, or variants of this protein, the 
very long chain fatty acid elongase activity of the proteins may be determined by expressing the cDNA 
molecule which encodes protein in question in yeast. For that purpose, the full-length coding region of 
the cDNA may be linked to the galactose inducible GALl promoter in the Saccharomyces cerevisiae 

25 expression vector, pYES2 (Invitrogcn). The yeast expressing the subject protein may then be employed 
to determine the substrate specificity of the CUTl protein by one of the following approaches. 

a. In vitro assay for VLCFA elongase activity using 
celi-free yeast homogenate 

30 • 

To determine the range of substrates recognized by the subject protein, acyl elongation activity 
is measured using substrates of varying carbon chain lengths and degrees of unsamration* In each case, 
15 \iM of an[l-'*C]acyl CoA (C18, C20, C22, C24 in 0.005% Triton X-100) is added to a standard 
assay mixnire containing 80 mM Hepcs-KOH, pH 7.2, 556 glycerol, ImM DTT, 0.5 mM NADPH. 1 
35 mM ATP. 5 mM MgCli, 1 mM malonyl-CoA, and an aliquot of cell free extract (50 ]ig protein) in a 
final volume of 50 nL. Incubation is carried out at 30*C for 1 h. The reaction is stopped with 100 jiL 
of 4 N KOH in 80% methanol and the lipids saponified for 1 h at 80°C. The mixmre is then acidified by 
adding 100 pL of cold 6N HCL and extracted twice widi 500 of cold hexane. The pooled hexane 
fractions are dried under Nj, followed by transmethylation for produa analyses. 
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b. In vivo assay: Feeding of transf rmed yeast cells with 

radi lab Jlcd acyUween substrates 

A second approach for determining substrate specificity involves growth of yeast cells in the 
5 presence of various (1-C"lacyl-Tween substrates (C18, C20, C22, C24; Terzaghi, 1986). Fatty acyl 
substrates provided in the growth medium as Tween-fatty acid esters are readily taken up from the 
medium and used by the cells. For each FAE protein, yeast cells are initially grown in the presence of 
several concentrations of a single acyl-Tween substrate for different lengths of time to determine the 
optimal substrate concentration and the duration of the feeding assays. Once these parameters are 
1 0 established, ycasi cells expressing the subject protein and control cells containing empty pYES2 plasmid 
arc grown in a dcfmed medium in the presence of a single radiolabcllcd acyUTwccn substrate/ At the 
end of the experiment, cells are pelleted, and then rcsuspendcd in I mL of I N methanolic-HCl 
(Supclco). Treatment with mcthanolic-HCl converts faoy acids to methyl esters (FAME). Radiolabcllcd 
FAMEs are analyzed as described bellow, to characterize the products generated by elongation of each 
1 5 acyl-Tween substrate. A comparison of radiolabelled FAMEs from CUTl containing yeast with FAMEs 
isolated from control cells allows the determination of the elongation specificity of the subject FAE 
protein. 



c. Product analyses 

20 The products of the elongation assays obtained in (a)» or pelleted yeast cells from experiment 

(b) arc transmcthylatcd in a sealed tube using 1 N mcthanolic-HCl (Supclco) at 80*C for 1 h. Samples 
are then extracted twice with 5(X) (iL of hexane after the addition of 1 mL of 0.9% NaCl, and the pooled 
extracts containing FAMEs concentrated under Nj. Radiolabcllcd FAMEs are applied on KCi, reverse- 
phase TLC plates (Whatman), and separated in acetoniirilc:tctrahydrofuran (85:15, v/v). Products of 

25 TLC separation are identified by co-chromatography widi FAME standards, or by GC-MS. In addition, 
FAMEs may be scraped from the TLC plates and their radioactivity determined by liquid scintillation 
coimting. 

EXAMPLES 

30 The following examples serve to illustrate various applications of the present invention. 

Example one: Modlflcation of A. thaliana Wax Production By 

Transformation with the CVTl cDNA 

35 a. Construction of binary transformation vector 
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The CVTl cDNA was cleaved out of the vector XZipLox (with KptiX-Bamlll) and the resulting 
1.85 kb fragment was dlrectionally subcloned into the Kpnl-BamiW sites of pGEM72(0 (Promega, 
Madison, WI). The resulting plasmid was then fully cleaved with Xhol, but only panially cleaved with 
Sstl, (since the CUTl cDNA has an internal Sstl site). The 1.9 kb product was isolated on an agarose 
5 gel and directionally subcloned into the Sail and 5^/1 sites of the vector pJD330 (Shaul and Galili 1992). 
This vector contains the 35S promoter of the cauliflower mosaic virus (CaMV) which provides 
constimtive expression in Arabidopsis, The subcloning results in the ClJTl cDNA being inserted in a 
sense orientation with respect to the CaMV 35S promoter. The JD330-CC/r7 cDNA construct was 
ligated with pBIN19 and the resulting binary vector was designated p35S-Ciyr7. This binary vector was 
10 transformed into the Agrobacterium tumefadens strain GV3101 (Koncz and Schell, 1986), and 

transfonnants were selected on LB medium containing 25 ^g/mL genumycin and 50 ^g/mL kanamycin. 

b. Transfonnation of Arabidopsis 
with the p35S-CUr] transgcnc 

1 5 Arabidopsis thaiiana (L.) Heynfa. ecotype Columbia was transformed using a combinaiiou of in 

pkmta (Chang et al., 1994, Katavic ct al., 1994) and vacuum infiltration methods (Bechtold et al.. 
1993). Plants were grown until the primary inflorescence shoots reached 1-2 cm in height, and then 
these bolts were cut off. The wound site was inoculated with 50 mL of an overnight Agrobacterium 
culture harbouring the p35S-CC/77 plasmid. After 4-6 days a nmnber of secondary inflorescences that 

20 appeared were cut off, and vacuum infiltration was performed on these plants using the conditions 

described by Bechtold et al. (1993). Screening for transformed seed was done as described previously 
(Katavic et al., 1994). Briefly, seed from infiltrated plants were plated out (approximately 1500 
seeds/plate) on solid minimal salts nutrient medium supplemented with 50 ^g/mL kanamycin. Seedlings 
that showed resistance were visible after approximately 8 days, because they turned green and elongated. 

25 Plants that were derived from seed harvested from different pots were considered as independent lines. 
Designations of transformed plants were as follows: the tofiltirated plant-Tl; primary transfonnants-T2; 
etc., as outlined in Kauvic et al. (1994). Plants were grown at 20^C under continuous fluorescent 
illumination (100 jtErn Vs). 

» « 

30 c. Ct/Ti-suppressed plants 

have altered wax composition 

Using the above transformation methods 46 kanamycin-resistant plants were obtained from 
seven different pots of Arabidopsis, Of the 46 plants obtained, 36 appeared waxless, having a glossy or 
eceriferum (cer) phenocype. At least one cer line was obuined from each pot implying that at least 
35 seven independent events had occurred in obtaining these lines. The surfaces of these cer plants were 
examined by a scaiming electron (SE) microscope. S£ micrographs clearly demonstrate that while wild- 
type plants were covered with the characteristic crystals of the epicuticular wax layer, transgenic cer 
plants were completely devoid of any wax crystals, implying diat a severe cer phenotype has been 
created. 



wo 98/46766 



PCT/CA98/00343 



Plant tissue from the transgenic lines was analyzed for fatty acid composition. Plant tissue was 
immersed for 10 seconds in a 2:1 chloroform:mcthanol solution to remove surface waxes. Extracts were 
then evaporated to dryness under a stream of nitrogen. Waxes were dissolved in \QO\i\ ofN.O- 
bis(TrimethylsilyI)trifluoroaceumide with \% Trimethylchlorosilane (Pierce), and dcrivatized at 80 **C for 
5 1 hour. Samples were analyzed in a Hewlett-Packard 5890 series II gas chromatograph equipped with a 
flame ionization detector, using either a DB-l column or a DB-5 column. 

GLC analyses were performed at the initial temperature of 150 "C, followed by a ramping of 4 
*C/min to 320 •C, where it was held for 10 min. Peaks were identified by the comparison of retention times 
to reference standards, and mass spectrometry. Quantification was based on flame ionization detector peak 
1 0 areas, which were converted to mass units by comparison to the internal standard, l7:0-methylestcr, which 
was added to each sample prior to the extraction. 

For wax load determinations only the principal surface lipids were measured, n-nonacosane (C29 
alkane), 14- and 15-nonacosanol (C29 secondary alcohol), 15-nonacosanone (C29 ketone), C22-C30 
aldehydes, C22-C30 primary alcohols and C16-C30 fatty acids (Hannoufa ct ai., 1993). The total area % 
15 of these peaks accounted for more than 90 % of the total area % of the sample. 

The wax constituents that are found on the stems of Arabidopsis plants originate from two 
biosynthctic pathways (Figure 1). The dccarbonylation pathway is the major pathway, which utilizes 
aldehydes to produce alkanes, secondary alcohols and ketones. In Arabidopsis (ecotype Columbia), the 
C29 species of the wax components produced by this pathway account for almost 90% of all the stem wax. 
20 The second pathway, the acyl-reduction pathway, produces primary alcohols, which account for 

approximately 5% of the total stem wax. Fatty acids and aldehydes, which arc precursors for all the other 
wax components, are shared by bodi biosynthctic pathways and make up the remaining 5%. 

Wax composition and quantity on the stems of wild-type and several transgenic lines were 
examined. Wild-type Arabidopsis stems contained on average 7106 (+/-) 1 184 mg of wax/ g dry wt. In 
25 conffast, wax loads on the stems of all shiny CC/Ty-supprcssed lines were severely reduced. For example, 
the wax load on the stems on the most severe line U 5 toUls 483 (+/-) 83, only 6-7 % of the wild-type wax 
accumulation. 

Analysis of wax composition of Ct/TZ-supprcssed plants revealed that the decarbonylation 
pathway is almost completely shut down. The C30 aldehyde. C29 alkane, C29 secondary alcohol and C29 

3 0 ketone reach only 3 .5 %, 2.2%, 1 .4% and 2.2% of the levels found on wild-type plants, respectively, 

CC/r/-suppression also has a major effect on the acyl-reduction pathway, causing a reduction in the levels 
of primary alcohols of over 50%. In addition, the relative abundance of different classes of alcohols is 
changed. C30 and C28 alcohols, the major alcohol species in wild type stems, have decreased by 90%, and 
C24 alcohol is the most abundant class in CUTl suppressed lines. The C24 species are also the most 

35 abundant classes of aldehydes and fatty acids in waxless transgenic plants. The described compositional 
changes were consistent in all 13 different CI/r7 -suppressed lines analyzed. These changes support the 
proposal that the r le of the CUTl enzyme is elongati n of the fatty acyl chain beyond 24 carbons. 
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Example two: Production f conditionaily male sterile COTZ-suppressed plants 

CLT/ -suppressed Arabidopsis plants were produced as described in Example one and analyzed 
for male sterility. This analysis dcmonsnated that, in addition to stem and leaf wax synthesis, the CUT! 
gene product has an essential role in pollen development. Similar to cer6'2 (Preuss et al., 1993) and cerl 

5 (Aarts ct al., 1995) wax-deficicnt mutants Arabidopsis, CLTT/'Suppressed plants arc completely male 
sterile under normal growdi conditions (30 to 40% relative humidity) although they produce normal 
amounts of pollen. However, when grown under high humidity (90 to 100%). pollen fertility is restored to 
the wild-type level, indicating that male sterility/fertility is conditional and environmentally controlled, 
just like in cer6-2 and cerl mutants. For these two mutants, conditional male sterility is explained by 

1 0 alterations in the composition and content of dte wax components of the tryphine layer covering the pollen 
grain. These long chain lipid molecules, produced in the tapetum layer of the anther, (Preuss et aL, 1993) 
are needed in the tryphine for proper pollen-pistil signalling and pollen germination. Thus, in their 
absence, sterility occurs. Conditional male sterility is a valuable trait for plant breeders; being able to 
selectively inhibit self-fcnilizaiion of plants facilitates the production of hybrid plants. Accordingly, the 

1 5 CUTl cDNA and derivatives thereof may be useful in producing conditionally male sterile plants useful 
in breeding programs. 

Taken together, the results of Examples one and two confirm that CUTl encodes a condensing 
enzyme that is involved in VLCFA biosynthesis of waxes which accumulate in die plant epidertcds, as 
20 well as waxes required for the development of functional pollen grains. In addition the results show that 
transformation of plants using the CVTl cDNA is useful to produce plants having modified VLCFA 
compositions, as well as plants that exhibit conditional male sterility. 

Example three: Use of CUTl gene promoter sequence 

25 

The promoter of the CUTl gene confers epidermis-specific expression. Accordingly, this promoter 
sequence may be used to produce iransgenc constructs that are specifically expressed in epidermal cells. 
Effective epidermis-specific expression may be achieved witii less than the entire 1951 bases of sequence 
upstream of the CUT! ORF shown in Scq. I.D. No. 12 Thus, by way of example, epidermis-specific 
30 expression may be obtained by employing the 1209 base pair promoter fragment One of skill in the art will 
recognize that still smaller regions of the sequence upstream of the CUTl ORF may be used to obtain 
epidermis-specific expression, such as a 50 base pair or 100 base pair region of die disclosed promoter 
sequence. 

The determination of whether a particular sub-region of the disclosed sequence operates to confer 
3 5 effective epidermis-specific expression in a particular system (taking into account die plant species into which 
the construct is being introduced, the level of expression required, etc.) will be performed using known 
methods, such as operably linking the promoter sub-region to a marker gene (e.g. GUS), introducing such 
constructs into plants and then determining the level of expression of the marker gene in epidermis and other 
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plant tissues. 

The present invention therefore facilitates the production, by standard molecular biology techniques, 
of nucleic acid molecules comprising this promoter sequence opcrably linked to a nucleic acid sequence, such 
as an open reading frame. Suitable open reading frames include open reading frames encoding any protein 
5 for which cpidcnnis-specific expression is desired. 

Having illustrated and described the principles of isolating CVTl nucleic acids, the CUT I protein 
and modes of use of these biological molecules, it should be apparent to one skilled in the an that the 
invention can be modified in arrangement and detail without departing from such ptincipjes. We claim all 
1 0 modifications coming within the spirit and scope of the claims presented herein. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: The University of British Columbia 
5 (ii) TITLE OF INVENTION: Nucleic Acids Encoding Plant Enzyme 
Involved In Very Long Chain Fatty Acid Synthesis 

(iii) NUMBER OF SEQUENCES: 12 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Sim & McBumey 
10 (B) STREET: 6th Floor, 3 30 University Avenue 

(C) CITY; Toronto 

(D) PROVINCE: Ontario 
{ E ) COUNTRY : Canada 
(F) POSTAL CODE: M5G 1R7 

15 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Disk, 3. 5 -inch 

(B) COMPUTER: IBM PC compatible 
(C> OPERATING SYSTEM: Windows 95 
(D) SOFTWARE: ASCII 

20 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 14 April 1998 

(C) CLASSIFICATION: 
(vii) PRIOR APPLICATION DATA: 

25 (A) APPLICATION NUMBER: 60/043 , 831 

(B) FILING DATE: April 14, 1997 
(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: April 10, 1998 
30 (viii) PATENT AGENT INFORMATION 

(A) NAME: RAE, Patricia A. 

(B) REFERENCE/DOCKET NUMBER: 3055-18/PAR 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (416) 595-1155 
35 (B) TELEFAX: (416) 595-1163 

(2) INFORMATION FOR SEQ ID NO: 1: 
(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 3712 

40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

45 TAGTGCTTTA TATATGTTTG ATACTTCTGT TTGGCAATAT CAATCATAGT 50 

AGAAAAGATA TGGACTTCAT TTGAGGTTTT TGGTGGATTG TGTCTATATG 100 

TGAAATCATG GGATCTCAAG ATTTGTCTGC ATTCAGTTTC CAAGTCAAAC 150 

50 

ATCGTAACTA CTGTTTGATT TTCCCTCATG CTTGCAGTTT TCATGGATAT 200 

CTCAAGATTT GTCTTCTTGC ACTTTCCAAG TCAAACATAA AGTAACTACTT 250 
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GATTGATATT CCCTCGTGTA TTACCCTCTT TCAAATGACA CAATTGGGCC 3 00 
CAAGTAGAGG AATTTCATAG TGAATTCAAA AGATTAACTG TATTCCACCG 3 50 
5 TCGTATTTTG ATAACATTTA GTTATTCCTT TTCTTTTTTT TCTTCTGCAA 4 00 
CAGTTTTTTT TTAATACATT TAGTGTTGGT TTGGTTCAAT GAAATATTAT 450 
ATGTTACTTC TTTTTTTGGA AATAAATTAT TCATTCTTTC TACTATAAAA 500 

10 

GGAATTGTTC ATGCTTTTTT GATACAATAG TATACCATTT CAAAAGATAC 550 
CATAGACCAG TTATTACATG AATCGCCAAA ACAACACTAA AATCAGAAAA 600 
15 TCAGTATATT TTGGTATAGT CTCCAACATA CAATCATAAA ACCTCTGTGA 650 
AATTTAAAAT CTATATTTGA CATTTCAAAG TTTAACAACA TAGTTCTAAA 700 
TAATTACCTA AATTTTAAGT CAAATGTGAA TTATATTTTA CTCTTCGATA 750 

20 

TCGGTTGTTG ACGATTAACC ATGCAAAAAA GAAACATTAA TTGCGAATGT 800 
AAATAACAAA ACATGTAACT CTTGTAGATA TACATGTATC GACATTTAAA 850 
25 CCCGAATATA TATGTATACC TATAATTTCT CTGATTTTCA CGCTACCTGC 900 
CACGTACATG GGTGATAGGT CCAAACTCAC AAGTAAAAGT TTACGTACAG 950 

TGAATTCGTC TTTTTGGGTA TAAACGTACA TTTAATTTAC ACGTAAGAAA 1000 

30 

GGATTACCAA TTCTTTCATT TATGGTACCA GACAGAGTTA AGGCAAACAA 1050 

GAGAAACATA TAGAGTTTTG ATATGTTTTC TTGGATAAAT ATTAAATTGA 1100 

35 TGCAATATTT AGGGATGGAC ACAAGGTAAT ATATGCCTTT TAAGGTATAT 1150 

GTGCTATATG AATCGTTTCG CATGGGTACT AAAATTATTT GTCCTTACTT 1200 

TATATAAACA AATTCCAACA AAATCAAGTT TTTGCTAAAA CTAGTTTATT 1250 

40 

TGCGGGTTAT TTAATTACCT ATCATATTAC TTGTAATATC ATTCGTATGT 1300 

TAACGGGTAA ACCAAACCAA ACCGGATATT GAACTATTAA AAATCTTGTA 13 50 

45 AATTTGACAC AAACTAATGA ATATCTAAAT TATGTTACTG CTATGATAAC 14 00 

GACCATTTTT GTTTTTGAGA ACCATAATAT AAATTACAGG TACGTGACAA 14 50 

GTACTAAGTA TTTATATCCA CCTTTAGTCA CAGTACCAAT ATTGCGCCTA 1500 

50 

CCGGGCAACG TGAACGTGAT CATCAAATCA AAGTAGTTAC CAAACGCTTT 1550 

GATCTCGATA AAACTAAAAG CTGACACGTC TTGCTGTTTC TTAATTTATT 1600 
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TCTCTTACAA CGACAATTTT GAGAAATATG AAATTTTTAT ATCGAAAGGG 16 50 

AACAGTCCTT ATCATTTGCT CCCATCACTT GCTTTTGTCT AGTTACAACT 170 0 

5 GGAAATCGAA GAGAAGTATT ACAAAAACAT TTTTCTCGTC ATTTATAAAA 175 0 

AAATGACAAA AAATTAAATA GAGAGCAAAG CAAGAGCGTT GGGTGACGTT 180 0 

GGTCTCTTCA TTAACTCCTC TCATCTACCC CTTCCTCTGT TCGCCTTTAT 18 50 

10 

ATCCTTCACC TTCCCTCTCT CATCTTCATT AACTCATCTT CAAAAATACC 1900 

CTAATCACAT TTTGTAACAA TAATACAATT ATACATTAAA ACTCTCCGAC 1950 

15 G ATG CCT GAG GCA COG ATG CCA GAG TTC TCT AGC TCG GTG 1990 
Met Pro Gin Ala Pro Met Pro Glu Phe Ser Ser Ser Val 
15 10 

AAG CTC AAG TAG GTG AAA CTT GGT TAC CAA TAT TTG GTT AAC 
20 Lys Leu Lys Tyr Val Lys Leu Gly Tyr Gin Tyr Leu Val Asn 2032 
15 20 25 

CAT TTC TTG AGT TTT CTT TTG ATC CCG ATC ATG GCT ATT GTC 2 074 
His Phe Leu Ser Phe Leu Leu lie Pro lie Met Ala lie Val 
25 30 35 40 

GCC GTT GAG CTT CTT CGG ATG GGT CCT GAA GAG ATC CTT AAT 2116 
Ala Val Glu Leu Leu Arg Met Gly Pro Glu Glu lie Leu Asn 
45 50 55 

30 

GTT TGG AAT TCA CTC CAG TTT GAC CTA GTT CAG GTT CTA TGT 2158 
Val Trp Asn Ser Leu Gin Phe Asp Leu Val Gin Val Leu Cys 
60 65 

35 TCT TCC TTC TTT GTC ATC TTC ATC TCC ACT GTT TAC TTC ATG 22 00 

Ser Ser Phe Phe Val lie Phe He Ser Thr Val Tyr Phe Met 
70 75 80 

TCC AAG CCA CGC ACC ATC TAC CTC GTT GAC TAT TCT TGT TAC 2242 

40 Ser Lys Pro Arg Thr He Tyr Leu Val Asp Tyr Ser Cys Tyr 

• 85 90 95 

AAG CCA CCT GTC ACG TGT CGT GTC CCC TTC GCA ACT TTC ATG 22 84 
Lys Pro Pro Val Thr Cys Arg Val Pro Phe Ala Thr Phe Met 
45 100 105 110 

GAA CAC TCT CGT TTG ATC CTC AAG GAC AAG CCT AAG AGC GTC 2326 
Glu His Ser Arg Leu lie Leu Lys Asp Lys Pro Lys Ser Val 
115 120 125 

50 

GAG TTC CAA ATG AGA ATC CTT GAA CGT TCT GGC CTC GGT GAG 236 8 
Glu Phe Gin Met Arg lie Leu Glu Arg Ser Gly Leu Gly Glu 
130 135 
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GAG ACT TOT CTC CCT CCG OCT ATT CAT TAT ATT CCT CCC ACA 2410 
Glu Thr Cys Leu Pro Pro Ala lie His Tyr lie Pro Pro Thr 
140 145 150 

5 CCA ACC ATG GAC GCG GCT AGA AGC GAG GCT CAG ATG GTT ATC 24 52 
Pro Thr Met Asp Ala Ala Arg Ser Glu Ala Gin Met Val lie 
155 160 165 

TTC GAG GCC ATG GAC GAT CTT TTC AAG AAA ACC GGT CTT AAA 24 94 
10 Phe Glu Ala Met Asp Asp Leu Phe Lys Lys Thr Gly Leu Lys 
170 175 180 

CCT AAA GAC GTC GAC ATC CTT ATC GTC AAC TGC TCT CTT TTC 2 536 
Pro Lys Asp Val Asp lie Leu lie Val Asn Cys Ser Leu Phe 
15 185 190 195 

TCT CCC ACA CCA TCG CTC TCA GCT ATG GTC ATC AAC AAA TAT 2S78 
Ser Pro Thr Pro Ser Leu Ser Ala Met Val lie Asn Lys Tyr 
200 205 

20 

AAG CTT AGG AGT AAT ATC AAG AGC TTC AAT CTT TCG GGG ATG 2620 
Lys Leu Arg Ser Asn lie Lys Ser Phe Asn Leu Ser Gly Met 
210 215 220 

25 GGC TGC AGC GCG GGC CTG ATC TCA GTT GAT CTA GCC CGC GAC 2662 
Gly Cys Ser Ala Gly Leu lie Ser Val Asp Leu Ala Arg Asp 
225 230 235 

TTG CTC CAA GTT CAT CCC AAT TCA AAT GCA ATC ATC GTC AGC 2704 
30 Leu Leu Gin Val His Pro Asn Ser Asn Ala lie lie Val Ser 
240 245 250 

ACG GAG ATC ATA ACG CCT AAT TAG TAT CAA GGC AAC GAG AGA 2 746 
Thr Glu lie lie Thr Pro Asn Tyr Tyr Gin Gly Asn Glu Arg 
35 255 260 265 

GCC ATG TTG TTA CCC AAT TGT CTC TTC CGC ATG GGT GCG GCA 2788 
Ala Met Leu Leu Pro Asn Cys Leu Phe Arg Met Gly Ala Ala 
270 275 

40 

GCC ATA CAC ATG TCA AAC CGC CGG- TCT GAC CGG TGG CGA GCC 2 8 30 
Ala lie His Met Ser Asn Arg Arg Ser Asp Arg Trp Arg Ala 
280 285 290 

45 AAA TAC AAG CTT TCC CAC CTC GTC CGG ACA CAC CGT GGC GCT 2 872 
Lys Tyr Lys Leu Ser His Leu Val Arg Thr His Arg Gly Ala 
295 300 305 

GAC GAC AAG TCT TTC TAC TGT GTC TAC GAA CAG GAA GAC AAA 2 914 
50 Asp Asp Lys Ser Phe Tyr Cys Val Tyr Glu Gin Glu Asp Lys 
310 315 320 

GAA GGA CAC GTT GGC ATC AAC TTG TCC AAA GAT CTC ATG GCC 2 956 
Glu Gly His Val Gly lie Asn Leu Ser Lys Asp Leu Met Ala 
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10 



325 330 335 

ATC GCC GGT GAA GCC CTC AAG GCA AAC ATC ACC ACA ATA GGT 2998 
lie Ala Gly Glu Ala Leu Lys Ala Asn lie Thr Thr lie Gly 
340 345 

CCT TTG GTC CTA CCG GCG TCA GAA CAA CTT CTC TTC CTC ACG 3 04 0 
Pro Leu Val Leu Pro Ala Ser Glu Gin Leu Leu Phe Leu Thr 
350 355 360 

TCC CTA ATC GGA CGT AAA ATC TTC AAC CCG AAA TGG AAA CCA 3 082 
Ser Leu He Gly Arg Lys He Phe Asn Pro Lys Trp Lys Pro 
365 370 375 

15 TAC ATA CCG GAT TTC AAG CTG GCC TTC GAA CAC TTT TGC ATT 3124 
Tyr He Pro Asp Phe Lys Leu Ala Phe Glu His Phe Cys He 
380 385 390 

CAC GCA GGA GGC AGA GCG GTG ATC GAC GAG CTC CAA AAG AAT 3166 
20 His Ala Gly Gly Arg Ala Val He Asp Glu Leu Gin Lys Asn 
395 400 405 

CTA CAA CTA TCA GGA GAA CAC GTT GAG GCC TCA AGA ATG ACA 3 208 
Leu Gin Leu Ser Gly Glu His Val Glu Ala Ser Arg Met Thr 
25 410 415 

CTA CAT CGT TTT GGT AAC ACG TCA TCT TCA TCG TTA TGG TAC 3 2 50 
Leu His Arg Phe Gly Asn Thr Ser Ser Ser Ser Leu Trp Tyr 
420 425 430 

30 

GAG CTT AGC TAC ATC GAG TCT AAA GGG AGA ATG AGG AGA GGC 3292 
Glu Leu Ser Tyr He Glu Ser Lys Gly Arg Met Arg Arg Gly 
435 440 445 

35 GAT CGC GTT TGG CAA ATC GCG TTT GGG AGT GGT TTC AAG TGT 3 3 34 
Asp Arg Val Trp Gin He Ala Phe Gly Ser Gly Phe Lys Cys 
450 455 460 

AAC TCT GCC GTG TGG AAA TGT AAC CGT ACG ATT AAG ACA CCT 3 3 76 
40 Asn Ser Ala Val Trp Lys Cys Asn Arg Thr He Lys Thr Pro 
465 • 470 475 

AAG GAC GGA CCA TGG TCC GAT TGT ATC GAC CGT TAC CCT GTC 3418 
Lys Asp Gly Pro Trp Ser Asp Cys He Asp Arg Tyr Pro Val 
45 480 485 

TTT ATT CCC GAA GTT GTC AAA CTC TAA ACTGA 3 4 50 

Phe He Pro Glu Val Val Lys Leu 
490 495 



50 



AAACGTCTTT GAACGGTTTA GTAACGGTTT GATTTTGTGT TACGGTTTAG 3 500 
GTTTATTTGG TCTCGGGATT TGGTTTAAAG GGGATTGAGA AATGGGAAGT 3 550 
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TAGAGAGAAG AAAAAGCAAA GCATAAATGT TTGTATTTAA TTGCTCTGCA 3 600 

TATACTTAAA TCTCTGCTTT TCATTTGGGG TATTTTTTAG TTTCTCGTGC 3 650 

5 TGTAATTAAT AACTTGTGGT GTACTCAAAT AAGAATATTT CTCTCTGTTT 3 700 

AAAAAAAAAA AAAAAAAAAA AA 3712 

(2) INFORMATION FOR SEQ ID NO: 2 
10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1807 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AAATACC 7 
CTAATCACAT TTTGTAACAA TAATACAATT ATACATTAAA ACTCTCCGAC 57 

20 

G ATG CCT CAG GCA CCG ATG CCA GAG TTC TCT AGC TCG GTG 97 
Met Pro Gin Ala Pro Met Pro Glu Phe Ser Ser Ser Val 
15 10 

25 AAG CTC AAG TAC GTG AAA CTT GGT TAC CAA TAT TTG GTT AAC 13 9 

Lys Leu Lys Tyr Val Lys Leu Gly Tyr Gin Tyr Leu Val Asn 
15 20 25 

CAT TTC TTG AGT TTT CTT TTG ATC CCG ATC ATG GCT ATT GTC 181 
30 His Phe Leu Ser Phe Leu Leu He Pro He Met Ala He Val 
30 35 40 

GCC GTT GAG CTT CTT CGG ATG GGT CCT GAA GAG ATC CTT AAT 223 
Ala Val Glu Leu Leu Arg Met Gly Pro Glu Glu He Leu Asn 
35 45 50 55 

GTT TGG AAT TCA CTC CAG TTT GAC CTA GTT CAG GTT CTA TGT 265 
Val Trp Asn Ser Leu Gin Phe Asp Leu Val Gin Val Leu Cys 
60 65 

40 

TCT TCC TTC TTT GTC ATC TTC ATC TCC ACT GTT TAC TTC ATG 3 07 

Ser Ser Phe Phe Val He Phe He Ser Thr Val Tyr Phe Met 
70 75 80 

45 TCC AAG CCA CGC ACC ATC TAC CTC GTT GAC TAT TCT TGT TAC 349 
Ser Lys Pro Arg Thr He Tyr Leu Val Asp Tyr Ser Cys Tyr 
85 90 95 

AAG CCA CCT GTC ACG TGT CGT GTC CCC TTC GCA ACT TTC ATG 391 
50 Lys Pro Pro Val Thr Cys Arg Val Pro Phe Ala Thr Phe Met 
100 105 110 

GAA CAC TCT CGT TTG ATC CTC AAG GAC AAG CCT AAG AGC GTC 433 
Glu His Ser Arg Leu He Leu Lys Asp Lys Pro Lys Ser Val 



38 



wo 98/46766 



P CT/C A 9 8/0 0313 



10 



115 120 125 

GAG TTC CAA ATG AGA ATC CTT GAA CGT TCT GGC CTC GGT GAG 475 
Glu Phe Gin Met Arg lie Leu Glu Arg Ser Gly Leu Gly Glu 
130 135 

GAG ACT TGT CTC CCT CCG GCT ATT CAT TAT ATT CCT CCC ACA 517 
Glu Thr Cys Leu Pro Pro Ala lie His Tyr lie Pro Pro Thr 
140 145 150 

CCA ACC ATG GAC GCG GCT AGA AGC GAG GCT CAG ATG GTT ATC 5 59 

Pro Thr Met Asp Ala Ala Arg Ser Glu Ala Gin Met Val lie 
155 160 165 

15 TTC GAG GCC ATG GAC GAT CTT TTC AAG AAA ACC GGT CTT AAA 601 
Phe Glu Ala Met Asp Asp Leu Phe Lys Lys Thr Gly Leu Lys 
170 175 180 

CCT AAA GAC GTC GAC ATC CTT ATC GTC AAC TGC TCT CTT TTC 643 
20 Pro Lys Asp Val Asp lie Leu lie Val Asn Cys Ser Leu Phe 
185 190 195 

TCT CCC ACA CCA TCG CTC TCA GCT ATG GTC ATC AAC AAA TAT 685 
Ser Pro Thr Pro Ser Leu Ser Ala Met Val lie Asn Lys Tyr 
25 200 205 

AAG CTT AGG AGT AAT ATC AAG AGC TTC AAT CTT TCG GGG ATG 727 
Lys Leu Arg Ser Asn lie Lys Ser Phe Asn Leu Ser Gly Met 
210 215 220 

30 

GGC TGC AGC GCG GGC CTG ATC TCA GTT GAT CTA GCC CGC GAC 769 
Gly Cys Ser Ala Gly Leu lie Ser Val Asp Leu Ala Arg Asp 
225 230 235 

35 TTG CTC CAA GTT CAT CCC AAT TCA AAT GCA ATC ATC GTC AGC 811 
Leu Leu Gin Val His Pro Asn Ser Asn Ala lie lie Val Ser 
240 245 250 

ACG GAG ATC ATA ACG CCT AAT TAG TAT CAA GGC AAC GAG AGA 8 53 

40 Thr Glu lie lie Thr Pro Asn Tyr Tyr Gin Gly Asn Glu Arg 
255 260 • 265 

GCC ATG TTG TTA CCC AAT TGT CTC TTC CGC ATG GGT GCG GCA 895 
Ala Met Leu Leu Pro Asn Cys Leu Phe Arg Met Gly Ala Ala 
45 270 275 

GCC ATA CAC ATG TCA AAC CGC CGG TCT GAC CGG TGG CGA GCC 93 7 

Ala lie His Met Ser Asn Arg Arg Ser Asp Arg Trp Arg Ala 
280 285 290 



50 



AAA TAG AAG CTT TCC CAC CTC GTC CGG ACA CAC CGT GGC GCT 97 9 

Lys Tyr Lys Leu Ser His Leu Val Arg Thr His Arg Gly Ala 
295 300 305 
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GAC GAC AAG TCT TTC TAC TGT GTC TAC GAA CAG GAA GAC AAA 1021 
Asp Asp Lys Ser Phe Tyr Cys Val Tyr Glu Gin Glu Asp Lys 
310 315 320 

GAA GGA CAC GTT GGC ATC AAC TTG TCC AAA GAT CTC ATG GCC 1063 
Glu Gly His Val Gly lie Asn Leu Ser Lys Asp Leu Met Ala 
325 330 335 

ATC GCC GGT GAA GCC CTC AAG GCA AAC ATC ACC ACA ATA GGT 1105 
lie Ala Gly Glu Ala Leu Lys Ala Asn lie Thr Thr He Gly 
340 345 

CCT TTG GTC CTA CCG GCG TCA GAA CAA CTT CTC TTC CTC ACG 1147 

Pro Leu Val Leu Pro Ala Ser Glu Gin Leu Leu Phe Leu Thr 
350 355 360 

TCC CTA ATC GGA CGT AAA ATC TTC AAC CCG AAA TGG AAA CCA 1189 

Ser Leu He Gly Arg Lys He Phe Asn Pro Lys Trp Lys Pro 
365 370 375 

TAC ATA CCG GAT TTC AAG CTG GCC TTC GAA CAC TTT TGC ATT 1231 
Tyr He Pro Asp Phe Lys Leu Ala Phe Glu His Phe Cys He 
380 385 390 

CAC GCA GGA GGC AGA GCG GTG ATC GAC GAG CTC CAA AAG AAT 12 73 
His Ala Gly Gly Arg Ala Val He Asp Glu Leu Gin Lys Asn 
395 400 405 

CTA CAA CTA TCA GGA GAA CAC GTT GAG GCC TCA AGA ATG ACA 1315 
Leu Gin Leu Ser Gly Glu His Val Glu Ala Ser Arg Met Thr 
410 415 

CTA CAT CGT TTT GGT AAC ACG TCA TCT TCA TCG TTA TGG TAC 13 57 
Leu His Arg Phe Gly Asn Thr Ser Ser Ser Ser Leu Trp Tyr 
420 425 430 

GAG CTT AGC TAC ATC GAG TCT AAA GGG AGA ATG AGG AGA GGC 13 99 
Glu Leu Ser Tyr He Glu Ser Lys Gly Arg Met Arg Arg Gly 
435 440 445 

GAT CGC GTT TGG CAA ATC GCG TTT GGG AGT GGT TTC AAG TGT 1441 
Asp Arg Val Trp Gin He Ala Phe Gly Ser Gly Phe Lys Cys 
450 455 460 

AAC TCT GCC GTG TGG AAA TGT AAC CGT ACG ATT AAG ACA CCT 14 83 
Asn Ser Ala Val Trp Lys Cys Asn Arg Thr He Lys Thr Pro 
465 470 475 

AAG GAC GGA CCA TGG TCC GAT TGT ATC GAC CGT TAC CCT GTC 152 5 
I Lys Asp Gly Pro Trp Ser Asp Cys He Asp Arg Tyr Pro Val 

480 485 

TTT ATT CCC GAA GTT GTC AAA CTC TAA ACTGA 1557 
Phe He Pro Glu Val Val Lys Leu 
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AAACGTCTTT GAACGGTTTA GTAACGGTTT GATTTTGTGT TACGGTTTAG 160 7 

GTTTATTTGG TCTCGGGATT TGGTTTAAAG GGGATTGAGA AATGGGAAGT 16 57 

5 

TAGAGAGAAG AAAAAGCAAA GCATAAATGT TTGTATTTAA TTGCTCTGCA 170 7 

TATACTTAAA TCTCTGCTTT TCATTTGGGG TATTTTTTAG TTTCTCGTGC 17 5 7 

10 TGTAATTAAT AACTTGTGGT GTACTCAAAT AAGAATATTT CTCTCTGTTT 18 07 

(2) INFORMATION FOR SEQ ID NO : 3: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1491 
15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

20 ATG CCT CAG GCA CCG ATG CCA GAG TTC TCT AGC TCG GTG 39 

Met Pro Gin Ala Pro Met Pro Glu Phe Ser Ser Ser Val 
15 10 

AAG CTC AAG TAC GTG AAA CTT GGT TAC CAA TAT TTG GTT AAC 81 

25 Lys Leu Lys Tyr Val Lys Leu Gly Tyr Gin Tyr Leu Val Asn 

15 20 25 

CAT TTC TTG AGT TTT CTT TTG ATC CCG ATC ATG GCT ATT GTC 123 
His Phe Leu Ser Phe Leu Leu lie Pro lie Met Ala lie Val 
30 30 35 40 

GCC GTT GAG CTT CTT CGG ATG GGT CCT GAA GAG ATC CTT AAT 165 

Ala Val Glu Leu Leu Arg Met Gly Pro Glu Glu lie Leu Asn 
45 50 55 

35 

GTT TGG AAT TCA CTC CAG TTT GAC CTA GTT CAG GTT CTA TGT 2 07 

Val Trp Asn Ser Leu Gin Phe Asp Leu Val Gin Val Leu Cys 
60 65 

40 TCT TCC TTC TTT GTC ATC TTC ATC TCC ACT GTT TAC TTC ATG 249 
Ser Ser Phe Phe Val He Phe He Ser Thr Val Tyr Phe Met 
70 75 80 

TCC AAG CCA CGC ACC ATC TAC CTC GTT GAC TAT TCT TGT TAC 2 91 
45 Ser Lys Pro Arg Thr He Tyr Leu Val Asp Tyr Ser Cys Tyr 
85 90 95 

AAG CCA CCT GTC ACG TGT CGT GTC CCC TTC GCA ACT TTC ATG 333 
Lys Pro Pro Val Thr Cys Arg Val Pro Phe Ala Thr Phe Met 
50 100 105 110 

GAA CAC TCT CGT TTG ATC CTC AAG GAC AAG CCT AAG AGC GTC 3 75 
Glu His Ser Arg Leu He Leu Lys Asp Lys Pro Lys Ser Val 
115 120 125 
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GAG TTC CAA ATG AGA ATC CTT GAA CGT TCT GGC CTC GGT GAG 417 
Glu Phe Gin Met Arg lie Leu Glu Arg Ser Gly Leu Gly Glu 
130 135 

5 

GAG ACT TGT CTC CCT CCG GCT ATT CAT TAT ATT CCT CCC ACA 459 
Glu Thr Cys Leu Pro Pro Ala He His Tyr He Pro Pro Thr 
140 145 150 

10 CCA ACC ATG GAC GCG GCT AGA AGO GAG GCT CAG ATG GTT ATC 501 
Pro Thr Met Asp Ala Ala Arg Ser Glu Ala Gin Met Val He 
155 160 165 

TTC GAG GCC ATG GAC GAT CTT TTC AAG AAA ACC GGT CTT AAA 543 
15 Phe Glu Ala Met Asp Asp Leu Phe Lys Lys Thr Gly Leu Lys 
170 175 180 

CCT AAA GAC GTC GAC ATC CTT ATC GTC AAC TGC TCT CTT TTC 585 
Pro Lys Asp Val Asp He Leu He Val Asn Cys Ser Leu Phe 
20 185 190 195 

TCT CCC ACA CCA TCG CTC TCA GCT ATG GTC ATC AAC AAA TAT 62 7 

Ser Pro Thr Pro Ser Leu Ser Ala Met Val He Asn Lys Tyr 
200 205 

25 

AAG CTT AGG AGT AAT ATC AAG AGC TTC AAT CTT TCG GGG ATG 669 

Lys Leu Arg Ser Asn He Lys Ser Phe Asn Leu Ser Gly Met 

210 215 220 

30 GGC TGC AGC GCG GGC CTG ATC TCA GTT GAT CTA GCC CGC GAC 711 
Gly "Cys Ser Ala Gly Leu He Ser Val Asp Leu Ala Arg Asp 
225 230 235 

TTG CTC CAA GTT CAT CCC AAT TCA AAT GCA ATC ATC GTC AGC 753 
35 Leu Leu Gin Val His Pro Asn Ser Asn Ala He He Val Ser 
240 245 250 

ACG GAG ATC ATA ACG CCT AAT TAC TAT CAA GGC AAC GAG AGA 795 

Thr Glu He He Thr Pro Asn Tyr Tyr Gin Gly Asn Glu Arg 

40 255 260 265 

* 

GCC ATG TTG TTA CCC AAT TGT CTC TTC CGC ATG GGT GCG GCA 83 7 
Ala Met Leu Leu Pro Asn Cys Leu Phe Arg Met Gly Ala Ala 
270 275 



45 



GCC ATA CAC ATG TCA AAC CGC CGG TCT GAC CGG TGG CGA GCC 879 
Ala He His Met Ser Asn Arg Arg Ser Asp Arg Trp Arg Ala 
280 285 290 



50 AAA TAC AAG CTT TCC CAC CTC GTC CGG ACA CAC CGT GGC GCT 921 

Lys Tyr Lys Leu Ser His Leu Val Arg Thr His Arg Gly Ala 
295 300 305 

GAC GAC AAG TCT TTC TAC TGT GTC TAC GAA CAG GAA GAC AAA 963 
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Asp Asp Lys Ser Phe Tyr Cys Val Tyr Glu Gin Glu Asp Lys 
310 315 320 

GAA GGA CAC GTT GGC ATC AAC TTG TCC AAA GAT CTC ATG GCC 1005 
5 Glu Gly His Val Gly lie Asn Leu Ser Lys Asp Leu Met Ala 
325 330 335 

ATC GCC GGT GAA GCC CTC AAG GCA AAC ATC ACC ACA ATA GGT 1047 
lie Ala Gly Glu Ala Leu Lys Ala Asn lie Thr Thr lie Gly 
10 340 345 

CCT TTG GTC CTA CCG GCG TCA GAA CAA CTT CTC TTC CTC ACG 1089 
Pro Leu Val Leu Pro Ala Ser Glu Gin Leu Leu Phe Leu Thr 
350 355 360 

15 

TCC CTA ATC GGA CGT AAA ATC TTC AAC CCG AAA TGG AAA CCA 1131 
Ser Leu lie Gly Arg Lys lie Phe Asn Pro Lys Trp Lys Pro 
265 370 375 

20 TAG ATA CCG GAT TTC AAG CTG GCC TTC GAA CAC TTT TGC ATT 117 3 
Tyr lie Pro Asp Phe Lys Leu Ala Phe Glu His Phe Cys lie 
380 385 390 

CAC GCA GGA GGC AGA GCG GTG ATC GAC GAG CTC CAA AAG AAT 1215 
25 His Ala Gly Gly Arg Ala Val He Asp Glu Leu Gin Lys Asn 
395 400 405 

CTA CAA CTA TCA GGA GAA CAC GTT GAG GCC TCA AGA ATG ACA 125 7 
Leu Gin Leu Ser Gly Glu His Val Glu Ala Ser Arg Met Thr 
30 410 415 

CTA CAT CGT TTT GGT AAC ACG TCA TCT TCA TCG TTA TGG TAC 1299 

Leu His Arg Phe Gly Asn Thr Ser Ser Ser Ser Leu Trp Tyr 

420 425 430 

35 

GAG CTT AGC TAC ATC GAG TCT AAA GGG AGA ATG AGG AGA GGC 1341 

Glu Leu Ser Tyr He Glu Ser Lys Gly Arg Met Arg Arg Gly 
435 440 445 

40 GAT CGC GTT TGG CAA ATC GCG TTT GGG AGT GGT TTC AAG TGT 1383 
Asp Arg Val Trp Gin He Ala Phe Gly Ser Gly Phe Lys Cys 
450 455 460 

AAC TCT GCC GTG TGG AAA TGT AAC CGT ACG ATT AAG ACA CCT 1425 
45 Asn Ser Ala Val Trp Lys Cys Asn Arg Thr He Lys Thr Pro 
465 470 475 

AAG GAC GGA CCA TGG TCC GAT TGT ATC GAC CGT TAC CCT GTC 14 67 
Lys Asp Gly Pro Trp Ser Asp Cys He Asp Arg Tyr Pro Val 
50 480 485 

TTT ATT CCC GAA GTT GTC AAA CTC 14 91 

Phe He Pro Glu Val Val Lys Leu 
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10 



30 



50 



(2) INFORMATION FOR SEQ ID NO: 4 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 97 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4: 

Met Pro Gin Ala Pro Met Pro Glu Phe Ser Ser Ser Val Lys Leu Lys 



15 10 15 

Tyr Val Lys Leu Gly Tyr Gin Tyr Leu Val Asn His Phe Leu Ser Phe 

20 25 30 

Leu Leu lie Pro lie Met Ala lie Val Ala Val Glu Leu Leu Arg Met 
15 35 40 45 

Gly Pro Glu Glu lie Leu Asn Val Trp Asn Ser Leu Gin Phe Asp Leu 

50 55 60 

Val Gin Val Leu Cys Ser Ser Phe Phe Val lie Phe He Ser Thr Val 

20 65 70 75 80 

Tyr Phe Met Ser Lys Pro Arg Thr He Tyr Leu Val Asp Tyr Ser Cys 

85 90 95 

Tyr Lys Pro Pro Val Thr Cys Arg Val Pro Phe Ala Thr Phe Met Glu 

25 100 105 110 

His Ser Arg Leu He Leu Lys Asp Lys Pro Lys Ser Val Glu Phe Gin 

115 120 125 

Met Arg He Leu Glu Arg Ser Gly Leu Gly Glu Glu Thr Cys Leu Pro 



130 135 140 

Pro Ala He His Tyr He Pro Pro Thr Pro Thr Met Asp Ala Ala Arg 



145 150 155 160 

35 Ser Glu Ala Gin Met Val He Phe Glu Ala Met Asp Asp Leu Phe Lys 

165 170 175 

Lys Thr Gly Leu Lys Pro Lys Asp Val Asp He Leu He Val Asn Cys 
180 185 190 

40 Ser Leu Phe Ser Pro Thr Pro Ser Leu Ser Ala Met Val He Asn Lys 

195 200 205 

Tyr Lys Leu Arg Ser Asn He Lys Ser Phe Asn Leu Ser Gly Met Gly 

45 210 215 220 

Cys Ser Ala Gly Leu He Ser Val Asp Leu Ala Arg Asp Leu Leu Gin 



225 230 235 240 

Val His Pro Asn Ser Asn Ala He He Val Ser Thr Glu He He Thr 

245 250 255 

Pro Asn Tyr Tyr Gin Gly Asn Glu Arg Ala Met Leu Leu Pro Asn Cys 

260 265 270 
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Leu Phe Arg Met Gly Ala Ala Ala He His Met Ser Asn Arg Arg Ser 

275 280 285 

Asp Arg Trp Arg Ala Lys Tyr Lys Leu Ser His Leu Val Arg Thr His 

5 

290 295 300 

Arg Gly Ala Asp Asp Lys Ser Phe Tyr Cys Val Tyr Glu Gin Glu Asp 

305 310 315 320 

10 Lys Glu Gly His Val Gly He Asn Leu Ser Lys Asp Leu Met Ala He 

325 330 335 

Ala Gly Glu Ala Leu Lys Ala Asn He Thr Thr He Gly Pro Leu Val 

15 340 345 350 

Leu Pro Ala Ser Glu Gin Leu Leu Phe Leu Thr Ser Leu He Gly Arg 

355 360 365 

Lys He Phe Asn Pro Lys Trp Lys Pro Tyr He Pro Asp Phe Lys Leu 

20 

370 375 380 

Ala Phe Glu His Phe Cys He His Ala Gly Gly Arg Ala Val He Asp 

385 390 395 400 

25 Glu Leu Gin Lys Asn Leu Gin Leu Ser Gly Glu His Val Glu Ala Ser 

405 410 415 

Arg Met Thr Leu His Arg Phe Gly Asn Thr Ser Ser Ser Ser Leu Trp 

30 420 425 430 

Tyr Glu Leu Ser Tyr He Glu Ser Lys Gly Arg Met Arg Arg Gly Asp 

435 440 445 

Arg Val Trp Gin He Ala Phe Gly Ser Gly Phe Lys Cys Asn Ser Ala 

35 

450 455 460 

Val Trp Lys Cys Asn Arg Thr He Lys Thr Pro Lys Asp Gly Pro Trp 

465 570 475 480 

40 Ser Asp Cys He Asp Arg Tyr Pro Val Phe He Pro Glu Val Val Lys 

485 490 495 

Leu 
497 

45 

(2) INFORMATION FOR SEQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GTGCTTTATA TATGTTTG 18 
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(2) INFORMATION FOR SEQ ID NO; 6: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6: 

10 CGTCGGAGAG TTTTAATG 18 

(2) INFORMATION FOR SEQ ID NO: 7: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 
15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 

20 CTTCGATATC GGTTGTTG 18 

(2) INFORMATION FOR SEQ ID NO : 8: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

30 AAATACCCTA ATCACATTTT GTAA 24 

(2) INFORMATION FOR SEQ ID NO: 9: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

40 TTTAAACAGA GAGAAATATT CTTA 24 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 24 
45 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 

50 ATGCCTCAGG CACCGATGCC AGAG 24 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

5 

CAGCACGAGA AACTAAAAAA TACC 24 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 1951 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

15 

TAGTGCTTTA TATATGTTTG ATACTTCTGT TTGGCAATAT CAATCATAGT 50 

AGAAAAGATA TGGACTTCAT TTGAGGTTTT TGGTGGATTG TGTCTATATG 100 

20 TGAAATCATG GGATCTCAAG ATTTGTCTGC ATTCAGTTTC CAAGTCAAAC 150 

ATCGTAACTA CTGTTTGATT TTCCCTCATG CTTGCAGTTT TCATGGATAT 2 00 

CTCAAGATTT GTCTTCTTGC ACTTTCCAAG TCAAACATAA AGTAACTACT 250 

25 

GATTGATATT CCCTCGTGTA TTACCCTCTT TCAAATGACA CAATTGGGCC 3 00 

CAAGTAGAGG AATTTCATAG TGAATTCAAA AGATTAACTG TATTCCACCG 350 

30 TCGTATTTTG ATAACATTTA GTTATTCCTT TTCTTTTTTT TCTTCTGCAA 400 

CAGTTTTTTT TTAATACATT TAGTGTTGGT TTGGTTCAAT GAAATATTAT 450 

ATGTTACTTC TTTTTTTGGA AATAAATTAT TCATTCTTTC TACTATAAAA 500 

35 

GGAATTGTTC ATGCTTTTTT GATACAATAG TATACCATTT CAAAAGATAC 550 

CATAGACCAG TTATTACATG AATCGCCAAA ACAACACTAA AATCAGAAAA 600 

40 TCAGTATATT TTGGTATAGT CTCCAACATA CAATCATAAA ACCTCTGTGA 650 
* » 

AATTTAAAAT CTATATTTGA CATTTCAAAG TTTAACAACA TAGTTCTAAA 700 

TAATTACCTA AATTTTAAGT CAAATGTGAA TTATATTTTA CTCTTCGATA 750 

45 

TCGGTTGTTG ACGATTAACC ATGCAAAAAA GAAACATTAA TTGCGAATGT 800 

AAATAACAAA ACATGTAACT CTTGTAGATA TACATGTATC GACATTTAAA 850 

50 CCCGAATATA TATGTATACC TATAATTTCT CTGATTTTCA CGCTACCTGC 900 

CACGTACATG GGTGATAGGT CCAAACTCAC AAGTAAAAGT TTACGTACAG 950 

TGAATTCGTC TTTTTGGGTA TAAACGTACA TTTAATTTAC ACGTAAGAAA 1000 
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GGATTACCAA TTCTTTCATT TATGGTACCA GACAGAGTTA AGGCAAACAA 1050 

GAGAAACATA TAGAGTTTTG ATATGTTTTC TTGGATAAAT ATTAAATTGA 1100 

5 

TGCAATATTT AGGGATGGAC ACAAGGTAAT ATATGCCTTT TAAGGTATAT 1150 

GTGCTATATG AATCGTTTCG CATGGGTACT AAAATTATTT GTCCTTACTT 1200 

10 TATATAAACA AATTCCAACA AAATCAAGTT TTTGCTAAAA CTAGTTTATT 1250 

TGCGGGTTAT TTAATTACCT ATCATATTAC TTGTAATATC ATTCGTATGT 1300 

TAACGGGTAA ACCAAACCAA ACCGGATATT GAACTATTAA AAATCTTGTA 1350 

15 

AATTTGACAC AAACTAATGA ATATCTAAAT TATGTTACTG CTATGATAAC 1400 

GACCATTTTT GTTTTTGAGA ACCATAATAT AAATTACAGG TACGTGACAA 1450 

20 GTACTAAGTA TTTATATCCA CCTTTAGTCA CAGTACCAAT ATTGCGCCTA 1500 

CCGGGCAACG TGAACGTGAT CATCAAATCA AAGTAGTTAC CAAACGCTTT 1550 

GATCTCGATA AAACTAAAAG CTGACACGTC TTGCTGTTTC TTAATTTATT 16Q0 

25 

TCTCTTACAA CGACAATTTT GAGAAATATG AAATTTTTAT ATCGAAAGGG 1650 

AACAGTCCTT ATCATTTGCT CCCATCACTT GCTTTTGTCT AGTTACAACT 1700 

30 GGAAATCGAA GAGAAGTATT ACAAAAACAT TTTTCTCGTC ATTTATAAAA 1750 

AAATGACAAA AAATTAAATA GAGAGCAAAG CAAGAGCGTT GGGTGACGTT 1800 

GGTCTCTTCA TTAACTCCTC TCATCTACCC CTTCCTCTGT TCGCCTTTAT 1850 

35 

ATCCTTCACC TTCCCTCTCT CATCTTCATT AACTCATCTT CAAAAATACC 1900 

CTAATCACAT TTTGTAACAA TAATACAATT ATACATTAAA ACTCTCCGAC 1950 

40 G 1951 
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Claims 

We claim: 

1. An isolated nucleic acid molecule that encodes a protein having very 
5 long chain fatty acid elongase activity, wherein the nucleic acid molecule is selected 

from the group consisting of: 

(a) nucleic acids con^rising at least 15 consecutive nucleotides of 
the sequence set forth in Scq. LD. No. 3; 

(b) nucleic acids possessing at least 70% sequence identity with the 
10 sequence set forth in Seq. I.D. No. 3; and 

(c) nucleic acids that hybridize under conditions of at least 70% 
stringency with the sequence set forth in Seq, LD. No. 3. 

2. An isolated nucleic acid molecule according to claim 1 wherein the 
15 nucleic acid molecule comprises the sequence set forth in Seq. LD. No. 3, 

3. An isolated nucleic acid molecule according to claim 1 wherein the 
nucleic acid molecule possess at least 80% sequence identity with the sequence set 
forth in Seq, LD. No. 3. 

20 

4. An isolated nucleic acid molecule according to claim 1 wherein the 
nucleic acid molecule hybridizes under conditions of at least 80% stringency with the 
sequence set forth in Seq. LD. No. 3, 

25 5, A purified protein encoded by a nucleic acid molecule accordmg to 

claimL 

6. A recombinant vector comprising a nucleic acid molecule according 

to claim L 

30 
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7. A recombinant vector according to claim 6 wherein the nucleic acid 
molecule is in reverse orientation relative to an adjacent promoter sequence of the 
vector. 

5 8, A transgenic plant comprising a recombinant vector according to 

claim 6. 

9. A transgenic plant comprising a recombinant vector according to 

claim?. 

10 

10. A transgenic plant comprising a recombinant expression cassette 
comprising a promoter sequence operably linked to a nucleic acid sequence selected 
from the group consisting of: 

(a) nucleic acids comprising at least 15 consecutive nucleotides of 
15 the sequence set forth in Seq. I.D, No. 1; 

(b) nucleic acids possessing at least 70% sequence identity with the 
sequence set forth in Seq. I.D. No. 3; and 

(c) nucleic acids that hybridize under conditions of at least 70% 
stringency widi the sequence set forth in Seq. I.D. No. 3. 

20 

11. A transgenic plant according to claim 10 wherein the nucleic acid 
sequence comprises at least 30 consecutive nucleotides of the sequence set forth in Seq. 
I.D. No. 1. 

25 12. A transgenic plant according to claim 10 wherein the nucleic acid 

sequence possess at least 80% sequence identity with the sequeiK:e set forth in Seq. 
I.D. No. 3. 



30 



13, A transgenic plant according to claim 10 wherein the nucleic acid 
sequence hybridizes tmder conditions of at least 80% stringency with the sequence set 
forth in Seq. I.D. No. 3. 
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14. A transgenic plant according to claim 10, wherein the plant has a 
modified phenotype compared to a non-transgenic plant of the same species. 

15. A transgenic plant according to claim 14 wherein the modified phenotype 
is a modified very long chain fatty acid composition. 

16. A transgenic plant according to claim 15 wherein the modified phenotype 
is a modified epicuticular wax layer. 

17. A transgenic plant according to claim 14 wherein the modified phenotype 
is modified seed oil composition. 

18. A transgenic plant according to claim 14 wherein the modified phenotype 
is conditional male sterility. 

19. A method of producing a plant with a modified very long chain fatty acid 
composition relative to a non-transgenic plant of the same species, comprising introducing 
into the plant a recombinant vector according to claim 6. 

20* A transgenic plant produced by the method of claim 19. 

21 . A transgenic plant produced by sexual or asexual propagation of a plant 
according to claim 20 or the progeny of said plant. 

22. An isolated nucleic acid molecule having a nucleotide sequence 
according to Seq. I.D. No. 3. 

23. An isolated nucleotide that encodes a protein having an amino acid 
sequence as shown in Seq. I.D. No. 4. 
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24. A method of isolating a nucleic acid molecule encoding a plant very 
long chain fatty acid elongation enzyme, the method comprising hybridizing a nucleic 
acid preparation with a DNA molecule comprising at least 15 consecutive nucleotides 
of the sequence set forth in Seq. LD, No. 1. 

5 

25. An isolated nucleic acid molecule isolated according to the method 

of claim 24. 

26. A recombinant nucleic acid molecule comprising a promoter 

10 sequence operably linked to a nucleic acid sequence, wherein the promoter sequence 
con^)rises a CUTl promoter. 

27. A recombinant nucleic acid molecule according to claim 26 wher§in 
the promoter sequence con:q>rises at least SO consecutive nucleotides of the sequence 

15 shown in Seq. I.D. No. 12- 

28. A purified peptide havii^ an amino acid sequence that is at least 
70% identical to the sequence set forth in Seq. I.D, No. 4. 
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